Media Perspective: Want to understand internet behaviour? Get a degree in stats

Facebook is launching a thing called Facebook Questions. It's "a beta product that lets you pose questions to the Facebook community".

Interesting, you might think, but not revolutionary. Then you notice that there are around 50,000 beta testers. That's a lot of people. And then you read that Twitter just handled its 20 billionth Tweet and someone recently dumped 100 million Facebook records on BitTorrent. You sit back and realise that we live in an age of extremely large numbers: an era in which human activity is measurable and trackable in immense detail, and all the data is easy to find and manipulate. But then, of course, you ask yourself: what do we do with all this information?

Social scientists haven't stopped to think for too long - they've dived right in and are doing real-world experiments with millions of participants. A recent article in New Scientist quotes Albert-Laszlo Barabasi of Northeastern University in Boston, Massachusetts: "The data revolution is here for social science: for the first time, scientists have a chance to study what humans do in real time and in an objective way. It's going to fundamentally change all fields of science that deal with humans."

There have, for instance, been a number of studies devoted to working out what makes something popular - using songs from unknown bands to look at music preference, following the diffusion of Facebook applications to understand the interplay between individual and social choices, and mining Twitter buzz to predict movie box-office takings. All fascinating stuff, and, as with any new technology, it's not taken very long for it to be commercialised and brought to the marketplace wrapped in shiny new boxes and freshly minted jargon.

There are now dozens of companies selling their own ways to mine, track and understand these very large datasets of people doing and saying stuff on the internet. You can do Sentiment Analysis or Opinion Mining. You can have an Analytics Dashboard and an Engagement Console. They can be extremely useful and they can sometimes lead you astray. Yet it's like the stop-go development of artificial intelligence. AI researchers talk about periodic "AI winters" - long periods following a flurry of hype and expectation when the funding dries up and the public spotlight moves elsewhere.

I suspect we'll see similar bubbles of expectation and disappointment in the study and modelling of internet behaviour and opinion. This will be tricky for the sentiment analysts and opinion miners, but will also be awkward for the rest of us trying to buy these services for ourselves and our clients. Like I keep saying - it's time to go back and get a degree in statistics.