Identifying Confounding Variables

Identifying and controlling potential confounding variables is the single most important task faced by researchers. If a confounding variable is allowed to affect the results of a study, no meaningful conclusions can be drawn from the hard work of designing and running the study. Consequently, the vast majority of research design methodology is devoted to this single task.

A variable can confound the results of a research study only if

  1. It has an impact on the dependent variable, and
  2. There are differences on this variable between the groups of the study.

If the potentially confounding variable has no impact on the dependent variable, then it could not affect the dependent variable and influence our conclusions. Even if it does influence the dependent variable, if the groups are equivalent on this potential confounding variable, then any effect it has on the dependent variable should be the same in all groups, thus not affecting the relative differences between groups. In research studies, we look at those relative differences to see evidence of the effect of the independent variable on the dependent variable. A simple hypothetical example can illustrate this point. In the most basic research design, we would compare the performance of two groups of research participants on the dependent variable. Let's assume for this example that the potential confounding variable is sex and that women score higher than men on this hypothetical variable. If one group has more men than women, we would expect the means to differ because of this sex difference (our confounding variable in this study) regardless of the effect of the independent variable. The effect of the independent variable may be in the same direction as the confounding variable, in the opposite direction, or the independent variable may have no effect. The problem is that we can determine the effect of the independent variable on the dependent variable only by observing the differences between our groups. With the existence of the confounding variable, the difference between the groups is no longer a valid indicator of the effect of the independent variable alone. However, if the two groups in our hypothetical study had the same number of males and females, whatever impact the variable of sex has on the dependent measure will be the same in each group, thus not affecting the difference between the groups. Consequently, assuming there are no other confounding variables, the difference between the groups will be a valid indicator of the effect of the independent variable on the dependent variable.

Identifying Potential Confounding Variables

Identifying potential confounding variables is a combination of good theory, careful analysis, and thorough detective work. Theory tells us where to look for confounding variables. The better a phenomenon is understood, the more detailed and helpful theories will be. The theories can help point us in the right place, but we must be systematic in listing the many potential confounding variables if we want to have any chance of controlling them. The detective work involves looking for evidence in the research literature about each potential confounding variable and its relationship to your dependent variable. This evidence will tell us whether the theoretically significant confounding variable is likely to be of practical significance in our research study.

We can illustrate this point with an example. Suppose we are interested in investigating the social relationships of introverted and extroverted high school freshman. Now these are preexisting groups, where participants are assigned to the groups on the basis of personality characteristics. We should always expect that preexisting groups will differ, not only on the variable that defined the group, but also on other variables that might confound our results. For example, we might expect that introverted and extroverted high school students may differ on (1) the number of friends they have, (2) the way they relate to their friends, (3) their activities, (4) the behavior of their parents toward them and their friends, (5) the type of part-time job they might get, (6) the way in which they study, (7) their level of self-esteem, and (8) how other students and teachers respond to them. All of these are predictable theoretically because we would expect that the introversion/extroversion dimension should influence them. We might also expect other differences such as (1) the way they look and (2) even slight differences in age. These we would predict theoretically because they might have contributed to the introversion/extroversion dimension. For example, someone who is born more attractive may well be more likely to be extroverted, whereas someone with an equal tendency toward extroversion who is less attractive may have become more introverted because they were less accepted by peers and others. How could a few months difference in age affect introversion? At high school age, probably very little, but a few months difference in age could make a big difference at 4 or 5 when children first start attending school or preschool. Perhaps the children who were a few months younger than their peers then were overwhelmed by the older children and the demands of school and started to withdraw, becoming more introverted. Identifying all the potential confounding variables can be a wonderful challenge and a lot of fun.

Once we have identified potential confounding variables, because they are theoretically related to our independent variable and, therefore, may differ in our groups, we must then determine whether they are related to the dependent variable. This is where the detective work comes in. We must search the research literature to find out if the potential confounding variables are correlated with the dependent variable(s) in our study. Occasionally we will find a study whose primary purpose is to look at such relationships, but much more often we will find studies of related phenomena that happened to measure these potential confounding variables and happened to compute the correlation of them with our dependent variable or with another variable very similar to our dependent variable. [We will talk shortly about how these things just "happen" to be a part of such studies.] If the variables are uncorrelated with our dependent variable, they do not affect the dependent variable and therefore cannot confound our results. In effect, we have ruled them out. If they are correlated, they may confound our results but ONLY if our groups actually differ on them. The research literature can often tell us if our theoretical relationship between our independent variable and the potential confounding variable actually exists in practice. If it does not, then we have again ruled out this potential confounding because our groups are unlikely to differ on this potential confounding variable. If the potential confounding variable and the independent variable are actually related, we have to take additional steps to rule out confounding.

Controlling Known Confounding Variables

A theoretical analysis and a thorough search of the research literature can identify many potential confounding variables and even rule out several of them, but it is rare that we can identify all potential confounding variables and even more rare that we can rule them out before doing the study. What we are left with is a short list of candidate confounding variables that we must monitor in our study. Normally, we would measure each of these variables in our study along with the independent and dependent variables. We then are able to (1) correlate each of these potential confounding variables with the dependent variable(s) to determine whether the relationship is strong enough to seriously affect our dependent variable(s) and (2) compare the groups to see how different the groups are on this potential confounding variable. This, by the way, is how you will find published research studies that just "happen" to report correlations between dependent variables and potential confounding variables. Those researchers were taking steps to control confounding and are reporting their findings to help future researchers with the same concerns.

If you are lucky, you will find that either (1) the correlation between the potential confounding variable and your dependent variables is small or (2) the difference between the groups on the potential confounding variable is small. Either finding is sufficient to rule out confounding because a variable cannot confound the findings unless it is both (1) correlated with the dependent measure and (2) shows a difference between the groups. If you cannot rule out confounding at this level, then more detailed procedures are necessary. Sometimes it is possible to select subgroups within your main groups that do not differ on the potential confounding variables. For example, we may find that the types of activities do distinguish our introverted and extroverted groups, but that we can form subgroups of introverted and extroverted participants who are involved in similar activites. Now this sample is likely to be biased, and we may be introducing new sources of confounding, but it gives us a chance to see if we get similar results when this one source of confounding is eliminated.

To illustratrate how new sources of confounding can be introduced by such a matching procedure, let's assume that we found a group of introverts and a group of extroverts that are engaged in the same high school activities. Perhaps the introverts are engaged in more activities than they might normally want to do because their parents push them more or are more controlling (one potential confounding variable). Perhaps the extroverts engage in fewer activities because they are poorer and have to hold a part-time job after school (another potential confounding variable). Perhaps the introverts who engage in more activities are inherently better at those activities and therefore are more willing to do them than the extroverts doing the same activities. Extroverts do activities because they like being around people whereas introverts are more selective, doing only those activities that they can excel at. See how easy it is to introduce new confounding even while you attempt to control for other confounding.

So how do we control for confounding. Typically, we use a converging series of studies with carefully selected groups. For example, some high schools may insist that students participate in certain activities or may exert strong pressure for all students to participate. A small high school, for example, may have so few kids that sports teams exist only if just about every qualified person joins the team, thus naturally controlling this potential confounding. Some schools may have teachers or guidance counselors that consistently encourage participation, thus reducing the effect of this potential source of confounding. If you know what confounding variables are most important to control, you can often find just the right samples that control for the confounding without adding new sources of confounding.

Controlling Confounding Variables that are Unknown

Most of this unit focused on controlling the confounding variables that we specifically identify, and in many research studies we have to identify the confounding variables to be able to design specific controls for eliminating their effects. There is one situation, however, where confounding variables can be controlled even when we have no idea what they are. If we use an experimental design with random assignment to groups, we are able to equate the groups statistically on all potential variables, including variables that could confound the results. By equating the groups before the study, we reduce the impact of any potential confounding variable, even if we have no idea what they are.

A critical advantage of experimental research designs is that experimental procedures statistically equate the groups statistically. This does not mean that the groups will be exactly alike on all variables. Rather, it means that the degree of expected deviation from equality can be statistically quantified and used in our inferences about the groups. Remember that inferential statistics compute the probability that the groups are equivalent (i.e., the null hypothesis). If that probability is small, we conclude that it is unlikely that the groups are equivalent and we state that the groups are significantly different statistically from one another. We are NOT saying that they are different, only that they are likely to be different. We are also not saying that there is no confounding, only that confounding is unlikely. It is always possible that, by chance, the groups started out unequal on a potential confounding variable, and thus this particular study was confounded. Remember, that even when we sample participants randomly and assign them to groups randomly, there is variability in how representative the samples are--variability that we refer to as sampling error. This is why replication is so important. Even in the best possible research design--the experiment--there is still a small chance of confounding. However, if the confounding is a chance event, it is unlikely to appear in a second study on the same question. If both experimental studies produce the same result, we can be confident that confounding is unlikely to account for the findings in both studies.


Confounding variables can confound the results of a study only if (1) they affect the dependent variables and (2) the groups differ on the confounding variable. Identifying potential confounding variables allows the researcher to rule out many of them by showing that one or the other of these conditions does not hold. Measuring confounding variables that cannot be ruled out can allow other procedures such as matching procedures, although one must be careful to not introduce other confounding with the matching. If one is able to use an experimental design with random assignment of participants to groups, virtually all confounding is controlled.

Copyright © 2000

Revised: June 3, 1999
URL: /handouts/confvar.htm