The Inconvenience of Online Non-probability Surveys

Issues with Nonprobability online surveys

Marketing in a pandemic has forced brands to become hyper-aware of the powerful social, political and economic forces that influence consumer preferences. With more companies under growing pressure to curb spending and find new pockets of growth, marketers must ensure that their data continues to deliver an honest picture of their current and potential customer landscape. Meanwhile, data providers whose products and services power marketing decisions have a responsibility to deliver information quickly and accurately. The foundational building block to achieve this level of performance requires a solid data collection methodology. In the world of consumer insights and surveys, this concept is referred to as Sample Selection – the population from which a researcher recruits respondents.

Convenience Samples - An Inconvenient Truth

Have you ever run a Google search for a statistic, only to receive multiple conflicting sources? Whether you’re looking for statistics on beer consumption, credit card usage or motor vehicle ownership, these data discrepancies often come down to critical differences in sampling methodologies.

One popular (and cheaper) method is known as convenience sampling, which uses online panels of respondents who have “opted-in” to be contacted, usually on an ongoing basis and for any number of survey topics. In such panels, respondents are generally not drawn randomly from the population of interest and are typically comprised of individuals that have made a conscious decision to monetize their use of the internet. While responses from these panelists are typically weighted to match U.S. demographics, such samples often fail to be nationally representative. Moreover, bias in these panels tend to be unpredictable.

Calibrating bias in online samples summary

For example, a 2019 MRI-Simmons study showed that consumer attitudes and use of the internet and technology are severely biased in online samples, in ways that cannot be corrected with demographic weights.

This research also identified biases in important subsets of psychographics and brand usage.

Calibrating Bias Online Sample Psychographic Issues
Calibrating bias in online samples brand issues

These findings were consistent with similar studies conducted by well-respected academia and industry groups:

  • In 2009, David Yeager and Jon Krosnick of Stanford University published a report recommending "caution before asserting or presuming that non-probability samples yield data that are as accurate or more accurate than data obtained from probability samples."
  • In 2013, the American Association for Public Opinion Research (AAPOR) published a report warning researchers about the use of online panels.
  • In 2016, a Pew Research study found widespread errors from online nonprobability surveys for estimates based on blacks and hispanics.

The appeal of convenience sampling is easily inferred from its name: faster turnaround at lower costs, i.e., convenient. But, as with many things that are faster and cheaper, results rarely measure up to the real deal. Insights from convenience panels are far from representative of the entire population and run the risk of fueling poor downstream marketing decisions. Although convenience panels can attempt to recruit members so that key demographics (i.e. age, gender, region) match benchmarks like those from the US Census Bureau, this recruiting does not ensure that every person has a chance of ever joining a panel. And even with these recruitment quotas in place, convenience panels are widely known to under-represent important consumer groups like young males, minorities and populations with low income or education. A panelist’s underlying motivations for survey participation can further skew results. For example, a respondent may only accept invitations to surveys that focus on topics of personal interest. Or, he or she may provide false responses in order to qualify for participation and – ultimately - compensation (monetary or otherwise). All of these pitfalls muddy the waters on exactly who is being represented through this type of sample design.


Probabilistic address-based sampling: The Gold Standard

The gold standard (and proven) sampling methodology is probabilistic and address-based. By design, it ensures everyone in the US has a known chance of being selected to participate in a survey. The sampling process begins with a comprehensive database of all household addresses provided by the US Postal Service, and typically allows only one randomly selected respondent per household. The result is a set of respondents designed to mirror the natural distribution of the population on measurable variables (like demographics, behaviors, and attitudes) as well as the many unknown characteristics typically not addressed by surveys (underlying motivation to participate, current mood, search engine history, etc.)

Importantly, this gold standard approach can use several modes of making initial contact with potential respondents (either in person, by mail or by phone), and allows for flexible follow-up channels as well – including online. Whereas opt-in panels rely on the same or similar set of available respondents many times, probabilistic address-based sampling requires a researcher to proactively reach out to potential respondents at random every time a study is fielded. When carried out across tens of thousands of respondents (as MRI-Simmons does many times a year), this approach also allows marketers to examine granular slices of the market without compromising the stability of insights.

Mitigating Marketing Risk

In an ideal world, marketers would have the time and money to invest in gold standard data each and every time an information need arose. For a variety of reasons, this isn’t always possible. It is worth noting, then, that savvy researchers can find ways to correct for some of the biases inherent to convenience sampling. At MRI-Simmons we use our gold standard benchmarking dataset gathered via probabilistic address-based sampling to balance data from convenience panels by applying demographic, behavioral and attitudinal weights that match the results of our truth set. This is possible because probabilistic address-based sample design captures the true diversity of the US consumer market, and thus helps mitigate biases that emerge in stand-alone data from online panels.

A builder can’t construct a durable house on unstable ground, and a marketer cannot build successful strategies based on questionable data. Suboptimal sampling compromises the ability to get a true picture of audience size, brand preferences, spending power, and countless other measures central to marketing plans. This leads to incomplete, and even erroneous, conclusions about a consumer target. In these times of intense change, brands that prioritize a sound data source will be best equipped to get consumer insights right and nimbly pivot to stay connected and afloat no matter what comes next.

Matt Cumello
Matt Cumello
Matt Cumello is Senior Director of Marketing for MRI-Simmons, responsible for marketing strategy and execution. Matt has over 19 years of B2B marketing experience, having worked with both start-ups and established companies in market research and technology.