Domopalooza 2017: Mo’ Data Mo’ Problems
Salt Lake City—Ninety percent of the world’s data was created in the last two years.
That’s a staggering amount, and it’s easy to be impressed when looking at a statistic like that. But don’t get too caught up in those numbers, said Nate Silver, statistician and editor-in-chief of Five Thirty Eight, Thursday morning at Domopalooza 2017.
“I certainly don’t think 90 percent of the world’s useful knowledge has been created in the past two years,” he said. “Maybe .9 percent, if we’re lucky.”
Being faced with such a huge amount of data, with more storage capabilities growing exponentially as time goes on, creates a problem: what do you actually do with all that data? And how do you analyze it correctly or apply it in a positive and productive manner? You would think, as many do, that the collection of data would immediately lead to more accurate predictions, more immediate insights, but you don’t have to look much farther than the 2016 election to know that more data does not always equal accurate prediction.
“Big data” is the current buzzword in business—which Silver said makes plenty of sense. Every new technology is hugely hyped as soon as it begins to be accepted, and expectations of that tech skyrocket. Over time, when people realize “it’s not as easy as pressing a button and having all your problems solved,” said Silver, the hype crashes into what he calls a “trough of disillusionment.”
After that, expectations become managed, the tech becomes utilized, and people climb a “slope of enlightenment” and into a “plateau of productivity,” where the technology is examined and employed to its highest potential.
So, where are we in that cycle right now with big data?
Peak hype, said Silver.
“We’re at an interesting inflection point right now where people might know enough to be dangerous,” said Silver.
On election night, Five Thirty Eight’s model predicted that Trump had a 30 percent chance of winning—low, but still higher than the .5 percent chance other models forecasted.
“Those are really, really different forecasts!” he said. “People aren’t aware that looking at the data set in slightly different ways can give you different answers.”
Human desires, too, factor into the way data is processed. Silver pointed out that on a normal night, only 5 percent of Fox News viewers are Democrats, and only 1 percent of MSNBC viewers identify as Republicans. That factors into the respective interpretation of data.
“You have the fact that a lot of this data is analyzed in a hyper-competitive and very partisan environment, where people are increasingly finding ways to create their own realities. … Long before the whole idea of ‘fake news’, you had people existing in their bubbles,” said Silver.
Part of the issue is, he continued, that there’s simply more data right now than current established practices can keep up with.
“There’s way more information than people can process by themselves, and we don’t have good strategies for dealing with data sets this big. There are tech aspects of this problem, too, but philosophically, we just aren’t used to being bombarded with so much info, and we don’t have good strategies for adapting to it,” he said.
Still, as a statistician, Silver does have advice for how to create habits to deal with looking at and properly utilizing big data—so we can eventually reach that slope of enlightenment. Everything, from weather to finance, can and should utilize big data, but recognizing limitations and using better practices can keep companies (and weather forecasters) from huge missteps.
First, analysts should think probabilistically when they consider their data. Silver gave the example of a weather forecast that told the people of a North Dakota city a flood was coming, noting the water could crest as high as 49 feet. With the levees at 51 feet, the citizens and government of the city assumed they were properly prepared—only to have the waters crest at 53 feet. The forecasters knew they had a margin of error that was plus-or-minus nine feet, but didn’t include it in the forecast. Millions of dollars in uninsured property were lost.
Now, said Silver, forecasters are more careful to visualize uncertainty. Weather forecasters iterate, and are able to learn quickly from mistakes—iteration, says Silver, is a great idea for any analyst to learn directly.
“You really have the combination of big data with science, and the human side of it, too, has led to big gains,” he said.
Second, Silver said it’s important to know what your perspective is—there is an objective truth, but all our viewpoints are subjective, he says, and so soliciting alternate viewpoints is critical when hunting for the truth, for better analytical practices, and for better forecasts. Diverse, independent and decentralized viewpoints are critical to success.
Lastly, Silver said: go forth and err.
“The point is, when you’re working with big data sets, it often follows this process where you begin to get a few basic, major things right, and it improves accuracy a lot. … Then it becomes more difficult. Increasingly, you have diminishing returns. But people should not be afraid of making iterative improvements around the margin, especially in the tech space. In truly big data environments, there is no substitute for testing your data on real-world customers,” he said, adding that big tech unicorns like Google and Facebook tend to have an iterative and unstubborn approach in terms of investing in technologies and ideas. “Because we don’t know what works, and we’re dumb individually, we can collect a lot of knowledge from trial and error of learning through real time.”