deSigns of the Times: Or, When an F Is a Passing Grade. . .

Editor's Note: This article is part 5 in a series. See PCI February 2001 for the previous installment.

In a previous article, "deSigns of the Times: Or, t-ing off" (PCI February 2001), the calculation of the t statistic was defined. In this article, the use of the variance -the square of the standard deviation - and the F statistic will be described in an alternative method to decide whether two samples are not the same.

What Does ANOVA Mean?

The term ANOVA is an acronym that stands for analysis of variance. ANOVA is a statistical technique that is used to analyze statistical hypotheses concerning two or more items within a given treatment (for example, do different lots of raw material behave the same or differently? Does changing a reaction temperature change the yield?) or between treatments (for example, do reaction temperature changes or reaction time changes have the greater effect?). With ANOVA, a researcher can tell which variables must be controlled at which levels, so that a process is reproducible or that the desired result is maximized; and can tell which variables can be ignored, because they don't affect the process. As will be explained later in the article, a big "F" will be a passing grade for the researcher who uses this technique.

How are Two Treatments Compared?

As described in the previous article, the means of two treatments can be compared using the calculation of the t statistic (see Equation 1), where X is the average, s is the standard deviation, n is the number of replicates in the X and s calculation, 1 and 2 refer to the different levels or treatments of a variable being compared, t is the statistic, cf is the desired confidence, and n1+n2-2 are the number of degrees of freedom in the study.

The t statistic allows an experimenter to determine whether two levels within a treatment or whether two different treatments give the same results.

In actual practice, the experimenter assumes that the two treatments will give the same results - the null hypothesis. He then compares the two treatments by running a number of replicates on each and calculating the average, the standard deviation and the t value. The experimenter then uses a published table of t values to look up the t value that would be expected when only experimental error were operating and no real difference between the items existed. If the calculated t value is larger than the t value in the table, the researcher can conclude that the hypothesis was wrong and that the two treatments do not give the same results.

Example 1:
A researcher wanted to know if two lots of polyester had the same acid number. He believed that they did. He measured five replicates on each resin: Polyester 1: 3.9, 3.6, 3.5, 3.4, 3.6; and Polyester 2: 3.8, 3.9, 3.6, 3.8, 3.7. The average acid value for Polyester 1 was 3.6 and for Polyester 2 was 3.76. The standard deviation for the acid value of Polyester 1 was 0.19 and for Polyester 2 was 0.11. The calculated t value was 1.63. A t Table (found in all statistical references) tells the researcher that he would be wrong 15 times in 100 if he said these polyesters had different acid numbers. These were not acceptable odds, so the researcher concluded that the polyesters had the same acid numbers. A statistician would state that "the two polyesters could not be proven to be different."

Example 2:
The researcher of Example 1 was still not satisfied, so he decided to run five more replicates the next day: Polyester 1, 3.6, 3.4, 3.6, 3.4, 3.5; and Polyester 2, 3.8, 3.6, 3.7, 3.8, 3.8. Combining the first results with the second results, the average acid value for Polyester 1 was 3.55 and for Polyester 2 was 3.75, while the standard deviation for the acid value of Polyester 1 was 0.15 and for Polyester 2 was 0.10. With 20 tests, the researcher has increased his power to make a decision. The calculated t value was 2.98. This time the t Table tells that the researcher would be wrong only one time in 100, if he said these polyesters had different acid numbers. The researcher and the statistician would both conclude that with the additional data, the polyesters did not have the same acid numbers.

Is There Another Way to Evaluate Data?

In Examples 1 and 2, the averages were used to look for differences between lots. Another way is to use an Analysis of Variance (ANOVA) Table to do the same thing.

What is Variance?

In another article, "deSigns of the Times: Or, the lowly standard deviation" (PCI March 2000, p. 64), the sample standard deviation, s, was described as the square root of the variance (see Equation 2).

The variance is calculated by subtracting the average from each data point to obtain the deviation, d (see Equation 3); squaring each deviation; summing the squared deviation (see Equation 4); dividing the sum-of-the-squares by the number of data points, n-1 (see Equation 5), and finally taking the square root.

What is ANOVA?

Variance is a statistic that is calculated by measuring the effects when the levels of the variables are changed. In the previous example, a variance can be calculated for the difference between Polyester 1 and Polyester 2 and a variance can be calculated for the experimental error. These variances are additive.

How Is an ANOVA Table Constructed?

Example 3:
The combined data of Examples 1 and 2 will be used.

The average for each polyester is calculated: 3.55 and 3.75.

The grand average using both sets of data is calculated: 3.65.

The grand sum of the squared deviations (often abbreviated to "grand sum of the squares" or simply, Total SS) is calculated using 20 data points and the grand average (see for example, Equations 3 and 4): 0.49.

The sum of the squares within the two resins is calculated using each resin average, the grand average and the number of replicates.

The sum of the square between the two resins is calculated using the averages of the two resins and the individual data points. This is summarized in Table. 1.1

The Total Sum of Squares is the sum of the Between-Resins and Within-Resins Sums of Squares. This means that only two of the three sums of squares need to be calculated long-hand. The third can be calculated from the others.

The ANOVA Table is completed by including the degrees of freedom, df, and calculating the variance by dividing sum of squares by the df. The variance in an ANOVA Table is usually called the Mean Square.

In this example, two resins were compared, so the between-resins degrees-of-freedom is one. There are 10 analysis for each resin or nine degrees-of-freedom for each resin, so the experimental error or within-resin degrees-of-freedom is twice nine or 18. The grand degrees-of-freedom is 19 (the grand degrees-of-freedom is equal to the sum of the between-resins degrees-of-freedom and the within-resins degrees-of-freedom; or, is equal to the total number of experiments minus one).

In this example the within-resins Mean-Square is the error variance. The standard deviation for this acid number test is the square-root of the within-resins Mean-Square (s2 = 0.016); therefore, s = 0.127 with 18 degrees of freedom.

The above demonstrates calculations to isolate the variance between and within different lots of polyester, that is, the variance between the lots and the variance of the experimental error. But how does ANOVA tell whether Resin 1 is different than Resin 2? This requires the calculation of the F statistic and a comparison in an "F" test.

What is an F Statistic?

The challenge is to describe the F statistic and how to use it to judge differences in samples or treatments without going into its derivation1. An Fsample is calculated by dividing the variance between the samples, , by the error variance, (see Equation 6).

This calculated Fsample is then compared to values in an F table. F tables are published in most statistics references. An excerpt of the table is given in Table 2.

A sample variance, s2, is only an estimate of the population variance, s2. If two different samples are taken to calculate two variances, each s2 is an independent estimate of the variance s2. The ratio of the variances for the two samples could vary by as much as F and still be due to experimental variation. If the Fsample is larger than the F from the table, something more than just experimental variation is happening. This additional variation is attributed to the difference in the treatment.Using the data from Example 3, the following holds. Since the between-groups variance = polyester variance (with 1 degree of freedom) and the within groups

variance = experimental error (with 18 degrees of freedom), the F Value is calculated by dividing the between-groups variance by the within-groups variance (see Equation 7).

The next step is to look up the F value in an F Table. Since the between resins had 1 degree of freedom, the first column is used; and since the within resins had 18 degrees of freedom, the eighteenth row is used. The F table tells that at 90% confidence, F1,18,90 = 3.01; at 95% confidence, F1,18,95= 4.41; and at 99% confidence, F1,18,99 = 8.29.

The F1,18,95 = 4.41, tells that at the 95% confidence level, experimental Between Resins Variance can be as small as 0.016 or as large as 4.41 * 0.016 = 0.070 and still be due only to . Since the Between Resins Variance of Polyesters 1 and 2 = 0.200, this is larger than can be explained simply by experimental error. An experimenter would be wrong only 1 time in 20 if he said the lots were different. Moreover, at 99% confidence, F1,18,99 = 8.29 and the experimental Between Resins Variance can be as small as 0.016 or as large as 8.29 * 0.016 = 0.130. Even at the 99% confidence level, the Between Resins Variance of 0.200 is larger than can be explained by experimental error; and an experimenter would be wrong only 1 time in 100 if he said the lots were different.

An easier way to use the F table is to go to the ANOVA table and compare the calculated Fsample to the F1,18 from the F Table for each confidence level, Table 3.

A comparison of Fsample = 12.5 to the values in the table show that there is less than 1 chance in 100 that these lots are not different. The results of ANOVA are identical to the results of comparing the means in the t-test.

Why Construct a Complicated ANOVA Table When the t-test is Easier?

A t-test can only be used to compare two means at a time. When there are more than two groups in a treatment, use of the t-test would require that all pairs be compared. For example, if there were five in the group, then ten comparisons would have to be made. ANOVA can compare all five in one step.

Example 4:
The acid numbers of five polyesters were to be compared to see if any were different. Six replicate determinations of acid number were made for each resin. The data is given in Table 4 and the Analysis of Variance is given in Table 5.

The F ratio of 52.2 tells that there is less than 1 chance in 10,000 in making a mistake if the experimenter says that these five polyesters are not the same. Figure 1 shows a plot of the data for each polyester and comparison circles, which will allow the experimenter to judge which resins are the same and which are different.

Overlapping comparsion circles means that the ANOVA cannot tell if the resins are different. A comparison circle that is isolated indicates that the data disprove the hypothesis that a sample is the same as the others. Figure 1 shows one polyester with a high acid number, one with a low acid number and three polyesters whose acid numbers cannot be told apart. For these three resins, since one of the comparison circles does not overlap the other two exactly, additional experimentation on these three might show a difference.

This was an example of a one-way Analysis of Variance because only one variable was present.

How Can ANOVA Analysis Be Applied in a Coatings Lab?

Example 5:
Several years ago a complaint was received that four lots of a commercial polymeric isocyanate gave two-component coatings with different Gardner dry-time results. The dry times were measured in the lab using the same pigmented polyol, but varied the polyisocyanate lot (see Table 6).

These results led to a conclusion that indeed there did seem to be differences in the lots. A review of the production tests of these lots led to no definitive conclusions. At face value it seemed that lot D was slow to dry, lot C was fast to dry and lots A and B were in the middle. A statistically designed experiment was conducted to increase the number of degrees of freedom in order to increase the statistical "power." The dry times of these lots were evaluated over several days, shown in Table 7.

These results seemed to reveal a pattern about the performance of the lots and something about the experimental error. Table 8 shows the day average and standard deviation for the tests run on each lot and for the tests run on the different days, the grand mean and grand standard deviation for the test.

From this data two hypothesis can be stated: 1) There are no differences between the lots; and 2) There are no day-to-day differences in dry time. The data seem to show that the lot-to-lot averages fall into two groups: two resins have higher gel times than the other two; and that the day-to-day averages don't seem to fall into any pattern. To analyze this data an ANOVA Table is constructed (see Table 9) to test these hypotheses.

Since days and lots each had four values, each had three degrees of freedom and the error had nine degrees of freedom, the Fsample could have been compared to the Fs from the F table with three and nine degrees of freedom. If that were done, the experimenter could roughly say whether the calculated Fs were bigger or smaller than the corresponding Fs with 90, 95 or 99% confidence from a statistical table. (The Fs from the statistical table are given in Table 10 for reference.) In this example, in addition to the Fs, the computer calculated the exact probability of making the wrong statistical decision. If an experimenter would say that there is a difference between each day's testing, he would be wrong 77 out of 100 times, so the first hypothesis was true - there were no differences between the days. However, if he said that there is a difference between the lots, he would be wrong only four times in 100, and the second hypothesis was false - there were differences between the lots.

The standard deviation for the experiment was calculated by taking the square root of the error Mean Square. One surprise was that the day to day error was so large - a standard deviation of 82 minutes. This means that if an experimenter determined only one dry time per sample, a calculation of confidence limits using T values would require a difference of ~160 minutes before the experimenter could confidently say the samples had different dry times. In this case, differences between lots could be seen because of the increased power in the experiment due to the large number of replicates.

From an evaluation of comparison circles, Lot A was statistically not different than Lot C; and Lot B was statistically not different than Lot D. However, the group A and C was statistically different than the group B and D. Review of the production history was able to pinpoint a production variable that caused the two groups to have different dry times. As a result, the process, and so the product, was made more consistent. The process inconsistency probably would not have been found if only one dry time had been run per lot and if the experimental design had not been done.

This was an example of a two-way Analysis of Variance, because two variables were present. If ten variables were present, ANOVA could be used to construct a ten-way analysis. The ability to separate the variance for each variable is what makes ANOVA such a powerful analytical tool.

Where Else Can ANOVA Be Used in the Laboratory?

Analysis of Variance, ANOVA, is used for most statistically designed experimentation: comparison of several sets of data, as seen in the above examples; regression analysis; factorial statistics; mixture statistics; response surface analysis; Taguchi methods; robust testing; etc. These are all topics for future discussions.