Editor's Note: This article is Part 7 in a series. See PCI May 2002 for the previous installment.
Factorial Designs
Factorial is a mysterious word, jargon used by statisticians to keep everyone else in the dark. Factorial comes from the word factor. A factor is a synonym for variable: the variable to be studied by the researcher. Factorial designs come in many flavors.This article will focus on full factorial and central composite designs.
What are Factorial Designs Good for?
Factorial designs are eminently suitable to study continuous variables such as pressure, temperature, time, speed, acid number or viscosity, or nominal variables like sample, lot number, supplier, type, position, or grade. Factorial designs have been used to study chemical processes, analytical test method development, crop growth in agriculture, dose rates in the pharmaceutical industry and teaching effectiveness in education. Factorial designs are useful to identify which factors significantly affect what is being studied; to optimize processes to give the biggest, fastest, cheapest, etc.; and to evaluate experimental error - that is, the precision or variability of a process.What is a Full Factorial Design?
A full factorial design studies all combinations of the variables of interest. If n factors are studied at p levels for each factor, the total number of experiments would be pn. Table 1 shows how many experiments would be required for various numbers of n and p.As n gets bigger the number of experiments increases dramatically. Traditional experimentation could not afford the
money or time to study eight variables at five levels. (A researcher might do this using some of the new techniques for high
throughput screening now used in pharmaceutical research.) Of course, every variable does not need to have the same number of
levels. For, say, four variables, if there were w levels of the first variable, x levels for the second variable, y levels of the third
variable and z levels of the fourth variable, the total number of experiments would be w * x * y * z. For example, if the first
variable had two levels, the second variable had three levels, the third variable had four levels and the fourth variable had three
levels, the total number of experiments would be 72.
If all the factors to be studied are continuous variables, a step-wise scheme can be used.1 The study begins with a two
level design, where p = 2, for all factors. An example of a full factorial design for three variables at two levels is shown
graphically in Figure 1 and numerically in Table 2. (Coded variables are usually used, so that if temperature is a variable to be
studied between 100 and 200 deg C, -1 would represent 100 deg , the low level, and +1 would represent 200 deg , the high level. Note that
the table is arranged in "Standard Order" - in the table of experiments A varies the most rapidly - low, high, low, high, etc., then
B - low, low, high, high, etc., and finally C - low, low, low, low, etc.)
How is the Data Analyzed?
The response data in Table 2 from the two-level factorial design is analyzed by use of an Analysis of Variance (ANOVA).2 An ANOVA table is constructed as shown in Table 3.The ANOVA table allows the experimenter to judge the significance of the main effect for each variable as well as
interactions between two variables or between three variables can be judged. In this case, only variables A and B are significant.
This is seen by looking at the column marked Prob>F.3 Small numbers for Prob>F mean the effect is significant. For example,
the probability that the effect seen when varying variable A would happen by chance only one time in one thousand experiments,
while the effect seen for varying B would happen by chance only two times in 1,000 experiments. The experimenter can conclude
that changing A or B has a dramatic effect on the response. On the other hand, the probability that the effect seen when varying C
could occur by chance was 349 times out of 100, or varying C has only a random effect on the response. Similarly, the effect for
the interactions were also due to random noise.
How is Curvature Characterized?
In the ANOVA table, the Prob>F for curvature is 0.027; this means that the chances that curvature is not present is only 27 out of 1,000. Although the full factorial design with center points identified the fact that higher order terms, or curvature, are present in the response surface equation, this design cannot determine which variable or variables has higher order terms. (See reference 1 for a description of how center points help identify curvature.) To determine which variables have curvature, a second design is run using "star points" to augment the experiments from the first design. Table 4 shows the star point experiments that would be added to a two level factorial design and Figure 2 shows this graphically. (Continuing the example above, the coded variable for temperature of -1.68 would represent 66 deg and +1.68 would represent 234 deg .) Additional center points are run to determine if anything has changed since the first design was run.Final Considerations
Conducting an experiment using a full factorial design is relatively straightforward: the experimenter just has to do all the possible combinations. In the previous example, the definition of the quadratic predictive equation was broken into two steps using a total of 22 experiments. This was done to save work if no quadratic terms were present in the equation. If curvature were insignificant, the linear predictive equation and response surface would be had with only 12 experiments. If the experimenter absolutely knows that the response surface will be quadratic, experiments from both the full factorial design and the star point design would be combined in a single design. Only one set of four center points would be needed, so that a total of only 18 experiments would be run to get the quadratic predictive equation and response surface.Another way to save experiments would be to use the assumption that if the linear terms are not significant, the quadratic terms won't be either. In the previous example, C was not significant in the first design. An experimenter could be relatively safe to assume that C2 would also not be significant. In the example, the two C star points would not have been run.
These savings do not seem significant when only three variables are being considered. But if five or more variables are being studied, considerable savings could be accomplished by running the study in the two steps, since it is unlikely that all the variables would need to be carried forward.
The next article will consider how to make even more savings when many variables are to be studied. For example, if
eight variables were to be studied using a two level, full factorial design, the design would consist of 256 corner points, 16 star
points and four center points. Since a quadratic predictive equation using eight variables has only 45 coefficients, only 46
experiments would need to be run. The next article will begin a discussion on how a researcher could select some fraction of the
full factorial design - a fractional factorial design - and still be able to define the predictive equation.
References
1 See "deSigns of the Times: Selecting experiments," PCI May 2002.2 See "deSigns of the Times: or, When an F is a passing grade," PCI Aug. 2001, for a desciption of the contents of an ANOVA table.
3 Prob > F is the probability that a researcher would see this effect, that is, the probability that an F value this big would be seen simply on the basis of chance.