Chapter 6 Comparing Two Groups by the t-test

6.1 The t-test for Unpaired Plots

An individual unit in a population may be characterized in a number of different ways. A single tree, for example, can be described as alive or dead, hardwood or softwood, infected or uninfected and so forth. When dealing with observations of this type we usually want to estimate the proportion of a population having a certain attribute. Or, if there are two or more different groups, we will often be interested in testing whether or not the groups, we will often be interested in testing whether or not the groups differ in the proportions of individuals having the specified attribute. Some methods of handling these problems have been discussed in previous sections.

Alternatively, we might describe a tree by a measurement of some characteristics such as its diameter, height, or cubic volume. For this measurement type of observation we may wish to estimate the mean for a group as discussed in the section on sampling for measurement variables. If there are two or more groups we will frequently want to test whether or not the group means are different. Often the groups will represent types of treatment which we wish to compare. Under certain conditions, the \(t\) or \(F\) tests may be used for this purpose.

Both of these tests have a wide variety of applications. For the present we will confine our attention to tests of the hypothesis that there is no difference between treatment (or group) means. The computational routine depends on how the observations have been selected or arranged. The first illustration of a \(t\) test of the hypothesis that there is no difference between the means of two treatments assumes that the treatments have been assigned to the experimental units completely at random. Except for the fact that there are usually (but not necessarily) an equal number of units or “plots” for each treatment, there is no restriction on the random assignment of treatments.

In this example the “treatments” were two families of white pine which were to be compared on the basis of their volume production over a specified period of time. Twenty-two square one-acre plots were staked out for the study. Eleven of these were selected entirely at random and planted with seedlings of Family A. The remaining eleven were planted with seedlings of Family B. After the prescribed time period the pulpwood volume (in cords) was determined for each plot. The results were as follows:

Family A Family B
11, 5, 9, 9, 6, 9,
8, 10, 11, 9, 13, 8,
10, 8, 11, 6, 5, 8,
8, 8 10, 7
Sum=99 Sum=88
Average=9.0 Average=8.0

To test the hypothesis that there is no difference between the families means (sometimes referred to as a null hypothesis) we compute: \[t=\bar X_A-\bar X_B \over \sqrt\frac {s^2(n_A+n_B)}{(n_A)(n_B)}\]

where:

\(\bar A\) and \(\bar B\)=The arithmetic means for groups A and B.

\(n_A\) and \(n_B\)=The number of observations in groups A and B (\(n_A\) and \(n_B\) do not have to be the same).

\(s^2\)=The pooled within-group variance (calculation shown below).

To compute the pooled within-group variance, we first get the corrected sum of squares (SS) within each group.

\[SS_A=\Sigma X^2_A-\frac {(\Sigma X_A)^2}{n_A}=11^2+8^2+...+11^2-\frac {(99)^2}{11}=34\]

\[SS_B=\Sigma X^2_B-\frac {(\Sigma X_B)^2}{n_B}=9^2+9^2+...+6^2-\frac {(88)^2}{11}=54\]

Then the pooled variance is

\[s^2=\frac {SS_A+SS_B}{(n_A-1)(n_B-1)}=\frac {88}{20}=4.4\]

Hence,

\[t={9.0-8.0 \over \sqrt {4.4(\frac {11+11}{(11)(11)})}}=\frac {1.0}{\sqrt {0.80000}}=1.118\]

This value of \(t\) has \((n_A-1)+(n_B-1)\) degrees of freedom. If it exceeds the tabular value of \(t\) (table 2) at a specified probability level, we would reject the hypothesis. The difference between the two means would be considered significant (larger than would be expected by chance if there is actually no difference).

In this case, tabular \(t\) with 20 degrees of freedom at the 0.05 level is 2.086. Since our sample value is less than this, the difference is not significant at the 0.05 level.

Requirements–One of the unfortunate aspects of the \(t\) test and other statistical methods is that almost any kind of numbers can be plugged into the equations. But if the numbers and methods of obtaining them do not meet certain requirements, the result may be a fancy statistical facade with nothing behind it. In a handbook of this scope it is not possible to make the reader aware of all of the niceties of statistical usage, but a few words of warning are certainly appropriate.

A fundamental requirement in the use of most statistical methods is that the experimental material be a random sample of the population to which the conclusions are to be applied. In the t test of white pine families, the plots should be a sample of the sites on which the pines are to be grown, and the planted seedlings should be a random sample representing the particular family. A test conducted in one corner of an experimental forest may yield conclusions that are valid only for that particular area or sites that are about the same. Similarly, if the seedlings of a particular family are the progeny of a small number of parents, their performance may be representative of those parents only, rather than of the family.

In addition to assuming that the observations, the \(t\) test described above assumes that the population of such observations follows the normal distribution. With only a few observations, it is usually impossible to determine whether or not this assumption has been met. Special studies can be made to check on the distribution, but often the question is left to the judgment and knowledge of the research worker.

Finally, the \(t\) test of unpaired plots assumes that each group (or treatment) has the same population variance. Since it is possible to compute a sample variance for each group, this assumption can be checked with Bartlett’s test for homogeneity of variance. Most statistical textbooks present variations of the \(t\) test that may be used if the group variances are unequal.

6.1.1 Sample size

If there is a real difference of D feet between the two families of white pine, how many replicates (plots) would be needed to show that it is significant? To answer this, we first assume that the number of replicates will be the same for each group \((n_A=N_B=n)\). The equation for \(t\) can then be written: \[t={D \over \sqrt{\frac{2s^2}{n}}} \text { or } n=\frac {2t^2s^2}{D^2}\]

Next we need an estimate of the within-group variance, \(s^2\). As usual, this must be determined from previous experiments, or by special study of the populations.

Example–Suppose that we plan to tetst at the 0.05 level and wish to detect a true difference of \(D=1\) cord if it exists. From previous test we estimate \(s^2=5.0\). Thus we have: \[n=\frac {2t^2s^2}{D^2}=2t^2(\frac {5.0}{1.0})\]

Here we hit a snag. In order to estimate n we need a value for \(t\), but the value of \(t\) depends on the number of degrees of freedom, which depends on \(n\). The situation calls for an iterative solution–a fancy name for trial and error. We start with a guessed value of \(n\), say \(n_0=20\). As \(t\) has \((n_A-1)+(n_B-1)=2(n-1)\) degrees of freedom, we’ll use \(t=2.025(=t_0.05 \text { with 38 df})\) and compute: \[n_1=2(2.025)^2(\frac {5.0}{1.0})=41\]

The proper value of \(n\) will be somewhere between \(n_0\) and \(n_1\)–much closer to \(n_1\) than to \(n_0\). We can now make a second guess at \(n\) and repeat the process. If we try \(n_2=38\), \(t\) will have \(2(n-1)=74\) df and \(t_0.05=1.992\). Hence, \[n_3=2(1.992)^2(\frac {5.0}{1.0})=39.7\]

Thus, \(n\) appears to be over 39 and we will use \(n=40\) plots for each group or a total of 80 plots.

6.2 The t Test for Paired Plots

A second test was made of the two families of white pine. It also had 11 replicates of each family, but instead of the two families being assigned completely at random over the 22 plots, the plots were grouped into 11 pairs and a different family was randomly assigned to each member of a pair. The cordwood volumes at the end of the growth period were

Plot Pair 1 2 3 4 5 6 7 8 9 10 11 Sum Mean
Family A 12 8 8 11 10 9 11 11 13 10 7 110 10.0
Family B 10 7 8 9 11 6 10 11 10 8 9 99 9.0
\(d_i=A_i-B_i\) 2 1 0 2 -1 3 1 0 3 2 -2 11 1.0

As before, we wish to test the hypothesis that there is not real difference between the family means.

The value of \(t\) when the plots have been paired is \[t={\bar X_A-\bar X_B \over \sqrt {\frac {s^2_d}{n}}}=\frac {\bar d}{\sqrt {s^2_\bar d}}, \text { with (n-1) degrees of freedom}\]

where: \(n\)=The number of pairs of plots \(s^2_d\)=The variance of the individual differences between \(A\) and \(B\) \[s^2_d={\Sigma d^2_i-\frac{(\Sigma d_i)^2}{n} \over n-1}=\frac {2^2+1^2+...+(-2)^2-\frac {11^2}{11}}{10}=2.6\]

So, in this example we find \[t_{10df}=\frac {10.0-9.0}{\sqrt {2.6/11}}=2.057\]

Comparing this to the tabular value of \(t\) \((t_{0.05} \text { with 10 df=2.228})\), we find that the difference is not significant at the 0.05 level. That is, a sample mean difference of 1 cord or more could have occurred by chance more than one time in twenty even if there is not real difference between the race means. Usually, such an outcome is not regarded as sufficiently strong evidence to reject the hypothesis.

The paired test will be more sensitive (capable of detecting smaller real differences) than the unpaired test whenever the experimental units (plots in this case) can be grouped into pairs such that the variation between pairs is appreciably larger than the variation within pairs. The basis for pairing plots may be geographic proximity or similarity in any other characteristic that is expected to affect the performance of the plot. In animal-husbandry studies, litter mates are often paired, and where patches of human skin are the “plots,” the left and right arms may constitute the pair. If the experimental units are very homogeneous, there may be no advantage in pairing.

6.2.1 Number of replicates

The number \((n)\) of plot pairs needed to detect a true mean difference of size \(D\) is: \[n=\frac {t^2s_d^2}{D^2}\]

N.B.: Be sure to use the variance of the difference \((s^2_d)\) between paired plots in this equation and not the variance among plots.