Lady Gaga, the joy of data and why it pays not to be seduced too easily
A view from Richard Shotton

Lady Gaga, the joy of data and why it pays not to be seduced too easily

Marketers can use observed data to see how people actually think and behave. Just remember there is bias in every data set, Manning Gottlieb OMD's Richard Shotton argues.

When trying to understand their customers, marketers no longer have to rely on claimed data. That’s an improvement – claims can be misleading.

Sometimes people lie and sometimes they simply don’t know the genuine reasons behind a purchase. As David Ogilvy put it, "Consumers don’t think how they feel, say what they think or do what they say."

One of the richest observed data sources is from search. Not the elements that we normally associate with PPC campaigns: cost per click, click through rates and such like, but the search strings themselves.

Analysing search patterns reveals how people actually think and behave.

Consider gender expectations. According to the data scientist and New York Times journalist Seth Stephens-Davidowitz there are 2.5 more searches for "Is my son gifted?" than "Is my daughter gifted?".

Mayur Vohra and I checked UK data and found the same pattern. The relative volumes reveal an uncomfortable truth: that parents prioritise the intellectual development of their son. Good luck getting people to admit that in a survey.

Consumer insights as well as cultural ones

It’s not just cultural insights that you can glean from search, but commercial too.

Take vitamins. If you analyse the most popular search strings you see that few search for vitamins by their letter – so Vitamin A, Vitamin B or Vitamin C.

Instead, they search by the problems vitamins solve: which vitamins ward off a cold, or help muscle growth?

That’s an easy opportunity to seize: stop advertising and packaging vitamins by their letter and instead communicate the problem they solve. Not only will you work with human nature, rather than against it, but you’ll also be distinctive as most brands sells their vitamins by the letter.

Search is a powerful indicator of behaviour because of the incentives involved; search terms must accurately reflect your needs otherwise you’ll get the wrong results.

Compare that to a survey. When a researcher approaches you in the street there’s a subtle pressure to answer in a way that reflects well on you. We all want to appear positively to others.

Search isn’t the only observed data set that marketers can use. The other big one is social: social listening or analysing the profile of a brand’s Facebook fans.

With Facebook set to launch their own social listening tool, there’s potential for these data sources to become widely adopted.

Caveat emptor

But before we celebrate the power of data prematurely we should heed the salutary story of StreetBump, as told by Tim Harford.

The Streetbump app was a smart piece of kit – it harnessed the accelerometer in a smartphone to identify when a car hit a pothole.

When this happened the app messaged the Boston traffic authorities, alerting them to the location of the hazard.  

The authorities welcomed the app as the previous approach was for officials to drive randomly around the city searching for potholes. A time-consuming and costly tactic.

However, a year later the problems with the new approach surfaced. The affluent areas of Boston had smoother roads than ever but the parts with poorer or older residents fared worse.

Since fewer people had downloaded the app in those areas it appeared they had fewer potholes and, therefore, the council sent fewer repair trucks to their neighbourhood. The authorities had incorrectly assumed that the data was representative.

Advertising is making the same mistake. Take Facebook Insights. This system allows you to quickly and easily understand the demographics and interests of a brand’s Facebook fans.

According to Facebook's data, 86% of Lady Gaga fans are female. If you accepted that at face value, you would ignore men when trying to sell her albums. But is that correct?

Spotify streaming data tells a different story; according to that, only 56% of listeners are women.

The Facebook data harbours a bias. It doesn’t capture all of Lady Gaga’s fans but merely those willing to publically admit their fandom. Social media data reflects what people want the world to think about them not their actual behaviours.

If the problems were isolated to a few Lady Gaga fans, then advertising could rest easy. But many of the data sources we rely on are flawed.

For example, social listening systematically exaggerates the importance of brands in consumers’ lives. Only those with a strong opinion bother to tweet about a brand.

The vast indifferent majority are rendered invisible, because they don’t care enough to tweet.

Perhaps most dangerous of all is our naïve approach to digital attribution.

Too many brands assume that it captures all the factors leading to a sale. Unfortunately, its short-term bias means it under-weights long-term factors.

Refining campaigns on this data alone leads us into a downward spiral where we chose sales over saleability.

We’ve been seduced by digital data. We’re so excited by the novelty and scale of the new data sources at our disposal we have been blind to its limitations. By interpreting unrepresentative data uncritically, we make flawed decisions.

What should we do?

First, recognise that every data source gives you a partial view of consumers. Brands shouldn’t rely on one approach when trying to understand consumers.

Instead, they must use multiple data sets – a broad mix including both claimed and observed approaches.

When different techniques agree you can have more faith in the findings and when they disagree you need to concoct an explanatory hypothesis, which can in turn be tested.

Second, admit that all data is biased. It’s our job to recognise that bias and adjust for it. That’s one of the strengths of claimed data.

Every decent researcher knows that surveys or focus group are biased and they correct for it. We need to apply that scepticism to all data sets.

Finally, we should be more careful matching the data set to the task. Think of the Lady Gaga example.

Which data set should we trust? Spotify or Facebook? It depends on the task. If you are trying to sell concert tickets, then Facebook data is ideal, but if you want to sell albums the Spotify data is prefereable.

There is no perfect data set, right for all occasions, we need to use our judgement and select the right one for the right task.

Richard Shotton is deputy head of evidence at Manning Gottlieb OMD and is on Twitter at @rshotton