Original post by Jerry W. Thomas
A perfectly designed sampling plan often ends up with too many women and not enough men
completing the survey, or too many older people and not enough younger people. Data weighting might
make sense in these cases if you want totals that accurately reflect the whole population. The term
"data weighting" in most survey-related instances refers to respondent weighting (which in turn weights
the data or weights the answers). Instead of a respondent counting as one (1) in the cross-tabulations,
that respondent might count as 1.25 respondents or .75 respondents. Here are some best practices to
keep in mind when you are thinking about weighting survey data.
If possible, always perfectly balance the sample during the sampling and screening process so
you never have to weight any data. This is almost always the best and most defensible solution.
If you do decide to weight survey data, remember there is a price to pay. Nothing in life is free.
The cost of weighting data is reduced accuracy: sampling variance, standard deviation, and
standard error increase.
Remember that the cost of weighting data is higher (in terms of reduced accuracy) when the
sample size is smaller. If you have thousands of respondents, you can weight the data as much
as you please, and the cost in reduced accuracy is minimal. On the other hand, if you have fewer
than 100 respondents, the cost in reduced accuracy might be very great. Be especially cautious
in weighting data when samples sizes are small.
In deciding whether and how to weight survey data, it's a good idea to review the cross-tabs to
see which demographic (or other) variables appear to have the greatest impact on the answers.
For example, if men and women give very similar answers, weighting the sample by gender will
have little effect on the percentages in your tabulations. On the other hand, if different age
groups give different answers, weighting by age will change the numbers in your tabulations.
When data must be weighted, weight by as few variables as possible. As the number of
weighting variables goes up, the greater the risk that the weighting of one variable will confuse
or interact with the weighting of another variable.
When data must be weighted, try to minimize the sizes of the weights. A general rule of thumb
is never to weight a respondent less than .5 (a 50% weighting) nor more than 2.0 (a 200%
Keep in mind that up-weighting data (weight › 1.0) is typically more dangerous than down-
weighting data (weight ‹ 1.0). In up-weighting, you have too few respondents and pretend that
those respondents each count for more than one person; the greater the up-weight, the more
those respondents' answers count.
A best practice is to create two sets of cross-tabulations: one set weighted and one set
unweighted. Look at these two sets of cross-tabulations side by side to make sure all the
numbers look reasonable.
Most widely used tabulations systems and statistical packages use Iterative Proportional Fitting (or
something similar) to weight survey data, a method popularized by the statistician Deming about 75
Do not despair if you weight your survey data and the results are not what you hoped for. There are
hundreds of weighting schemes and algorithms, and each has its hidden assumptions and biases. So if you don't get the results you want with one weighting scheme, remember there are hundreds of other ways to weight the data: one of those might give you the answers your boss seeks. Of course, this is the reason for the first bullet point in this article.
Contact Symmetric today for expert advice on Data Weighting!