Original post by Jerry W. Thomas
A perfectly designed sampling plan often ends up with too many women and not enough men completing the survey, or too many older people and not enough younger people. Data weighting might make sense in these cases if you want totals that accurately reflect the whole population. The term "data weighting" in most survey-related instances refers to respondent weighting (which in turn weights the data or weights the answers). Instead of a respondent counting as one (1) in the cross-tabulations, that respondent might count as 1.25 respondents or .75 respondents. Here are some best practices to keep in mind when you are thinking about weighting survey data.
- If possible, always perfectly balance the sample during the sampling and screening process so you never have to weight any data. This is almost always the best and most defensible solution.
- If you do decide to weight survey data, remember there is a price to pay. Nothing in life is free. The cost of weighting data is reduced accuracy: sampling variance, standard deviation, and standard error increase.
- Remember that the cost of weighting data is higher (in terms of reduced accuracy) when the sample size is smaller. If you have thousands of respondents, you can weight the data as much as you please, and the cost in reduced accuracy is minimal. On the other hand, if you have fewer than 100 respondents, the cost in reduced accuracy might be very great. Be especially cautious in weighting data when samples sizes are small.
- In deciding whether and how to weight survey data, it's a good idea to review the cross-tabs to see which demographic (or other) variables appear to have the greatest impact on the answers. For example, if men and women give very similar answers, weighting the sample by gender will have little effect on the percentages in your tabulations. On the other hand, if different age groups give different answers, weighting by age will change the numbers in your tabulations.
- When data must be weighted, weight by as few variables as possible. As the number of weighting variables goes up, the greater the risk that the weighting of one variable will confuse or interact with the weighting of another variable.
- When data must be weighted, try to minimize the sizes of the weights. A general rule of thumb is never to weight a respondent less than .5 (a 50% weighting) nor more than 2.0 (a 200% weighting).
- Keep in mind that up-weighting data (weight › 1.0) is typically more dangerous than down- weighting data (weight ‹ 1.0). In up-weighting, you have too few respondents and pretend that those respondents each count for more than one person; the greater the up-weight, the more those respondents' answers count.
- A best practice is to create two sets of cross-tabulations: one set weighted and one set unweighted. Look at these two sets of cross-tabulations side by side to make sure all the numbers look reasonable.
Most widely used tabulations systems and statistical packages use Iterative Proportional Fitting (or something similar) to weight survey data, a method popularized by the statistician Deming about 75 years ago.
Do not despair if you weight your survey data and the results are not what you hoped for. There are hundreds of weighting schemes and algorithms, and each has its hidden assumptions and biases. So if you don't get the results you want with one weighting scheme, remember there are hundreds of other ways to weight the data: one of those might give you the answers your boss seeks. Of course, this is the reason for the first bullet point in this article.
Contact Symmetric today for expert advice on Data Weighting!