Chapter 1 General Concepts

1.1 Statistics - What For?

To the uninitiated it may often appear that the statistician’s primary function is to prevent or at least impede the progress of research. And even those who suspect that statistical methods may be more boon than bane are at times frustrated in their efforts to make use of the statistician’s wares.

Much of the difficulty is due to not understanding the basic objectives of statistical methods. We can boil these objectives down to two:

The estimation of population parameters (values that characterize a particular population).
The testing hypotheses about these parameters.

A common example of the first is the estimation of the coefficients \(a\) and \(b\) in the linear relationship, \(Y=a+bX\), between the variables \(Y\) and \(X\). To accomplish this objective one must first define the population involved and specify the parameters to be estimated. This is primarily the research worker’s job. The statistician helps devise efficient methods of collecting data and calculating the desired estimates.

Unless the whole population is examined, an estimate of a parameter is likely to differ to some degree from the population value. The unique contribution of statistics to research is that is provides ways of evaluating how far off the estimate may be. This is ordinarily done by computing confidence limits, which have a known probability of including the true value of the parameter. Thus, the mean diameter of the trees in a pine plantation may be estimated from a sample as 9.2 inches, with 95-percent confidence limits of 8.8 and 9.6 inc hes. These limits (if properly obtained) tell us that, unless a one-in-twenty chance has occurred in sampling, the true mean diameter is somewhere between 8.8 and 9.6 inches.

The second basic objective in statistics is to test some hypothesis about the population parameters. A common example is a test of the hypothesis that the regression coefficient \(b\) in the linear model \(Y=a + bX\) has some specified value (say zero). Another example is a test of the hypothesis that the difference between the means of two populations is zero.

Again, it is the research worker who should formulate meaningful hypotheses to be tested, not the statistician. This task can be tricky, the beginner would do well to work with the statistician to be sure that the hypothesis is put in a form that can be tested. Once the hypothesis is set, it is up to the statistician to work out ways of testing it and to devise efficient procedures for obtaining the data.

This handbook describes some of the methods of estimating certain parameters and testing some of the more common hypotheses.

1.2 Probability and Statistics

It is fairly well known that statisticians work with probabilities. They are supposed to know, for example, the likelihood of tossing coins heads up six times in a row, or the chances of a crapshooter making seven consecutive winning throws (“passes”), and many other such useful bits of information. (This is assumed to give them an edge in games of chance, but often other factors enter in there).

Despite recognized association of statisticians and probability, the fundamental role of probability in statistical activities is often not appreciated. In putting confidence limits on an estimated parameter, the part played by probability is fairly obvious. Less apparent to the neophyte is the operation of probability in the testing of hypotheses. Some of them say with derision, “You can prove anything with statistics.” The truth is. you can prove nothing; you can prove nothing; you can at most compute the probability of something happening and let the researcher draw his own conclusions.

To illustrate with a very simple example, in the game of craps the probability of the shooter winning (making a pass) is approximately 0.493 – assuming, of course, a perfectly balanced set of dice and an honest shooter. Suppose now that you run up against a shooter who picks up the dice and immediately makes seven passes in a row! It can be shown that if the probability of making a single pass is really 0.493, then the probability of seven or more consecutive passes is about 0.007 (or 1 in 141). This is where statistics end; you draw your own conclusions about the shooter. If you conclude that the shooter is pulling a fast one, then in statistical terms you are rejecting the hypothesis that the probability of the shooter making a single pass is 0.493.

Most statistical tests are of this nature. A hypothesis is formulated and an experiment is conducted or a sample is selected to test it. The next step is to compute the probability of the experimental or sample results occurring by chance if the hypothesis is true. If this probability is less than some preselected value (perhaps 0.05 or 0.01), the hypothesis is rejected. Note that nothing has been proved – we haven’t even proved that the hypothesis is false. We merely inferred this because of the low probability associated with the experiment or sample results.

Obviously our inferences may be wrong if we are given inaccurate probabilities. Reliable computation of these probabilities requires a knowledge of how the variable we are dealing with his distributed (that is, what the probability is of the chance occurrence of different values of the variable). Thus, if we know that the number of beetles caught in light traps follows what is called the Poisson distribution we can compute the probability of catching \(X\) or more beetles. But, if we assume that this variable follows Poisson when it actually follows the negative binomial distribution, our computed probabilities may be in error.

Even with reliable probabilities, statistical tests can lead to the wrong conclusions. We will sometimes reject a hypothesis that is true. If we always test at the 0.05 level, we will make this mistake on the average of 1 time in 20. We accept this degree of risk when we select the 0.05 level of testing. If we’re willing to take a bigger risk, we can test at the 0.10 or 0.25 level. If we’re not willing to take this much risk, we can test at the 0.01 or 0.001 level.

The fellow who always wears both a belt and suspenders might, at this point, conclude that he should always test at the 0.00001 level. Then he’d be wrong only 1 time in 100,000. But a researcher can make more than one kind of error. In addition to rejecting a hypothesis that is true (called a Type I error), he can make the mistake of not rejecting a hypothesis that is false (called a Type II error). In crap shooting, it is a mistake to accuse an honest shooter of cheating (Type I error – rejecting true hypothesis), but it is also a mistake to trust a dishonest shooter (Type II error – failure to reject a false hypothesis).

The difficulty is that for a given set of data, reducing the risk of one kind of error increases the risk of the other kind. If we set 15 straight passes as the critical limit for a crap shooter, then we greatly reduce the risk of making a false accusation (probability about 0.00025). But in so doing we have dangerously increased the probability of making a Type II error – failure to detect a phony. A critical step in designing experiments is the attainment of an acceptable level of probability for each type of error. This is usually accomplished by specifying the level of testing (i.e., probability of an error of the first kind) and then making the experiment large enough to attain an acceptable level or probability for errors of the second kind.

It is beyond the scope of this handbook to go into basic probability computations, distribution theory, or the calculation of Type II errors. But anyone who uses statistical methods should be fully aware that he is dealing with probabilities and not with immutable absolutes. The results of a \(t\), \(F\), or \(\text {chi-square}\) test must be interpreted with this in mind. It is also well to remember that one-in-twenty chances do actually occur – about one time out of twenty.