All these articles about Big Data remind me of a story.
What’s that? You don’t know what Big Data is? Frankly, I’m not all that sure myself. I know that I've been reading about big data over the past year, and I still don't quite understand. There have been articles about big data in business magazines, logistics management magazines, and countless advertisements. IBM is on a Big Data kick too, advertising how Big Data can become the engine of a “smarter planet.”
Intel co-founder, Gordon Moore, described a trend in a 1965 paper in which he noted that the number of components in integrated circuits had doubled every year since the invention of the integrated circuit in 1958. This observation became what we know as Moore's law, which states that computing power doubles in performance every 18 months.
For the past five decades, we’ve witnessed the geometric growth of computing power, and the concomitant geometric reduction in cost. Not only has processing power doubled at this relentless rate, so have storage and our application of computers in our everyday lives. Five decades ago, computers were massive machines that did nothing but compute mathematical equations. Today computers that fit in the palm of our hand can help us find the closest taco stand.
Gordon Moore himself did not expect that the number of components in an integrated chip would continue to double every two years over the long term. When he wrote his article about integrated circuits in the April 1965 Electronics Magazine, he believed that the biennial doubling would continue for at least 10 years. He even expected that the rate could increase. In 1965, Moore projected that the typical integrated circuit of the future would have over 65,000 components and be built on a single wafer. Today there are millions. Tomorrow?
With increases in computational power, storage, and the ability to automate data collection, the world constantly collects more data. International Data Corporation (IDC) released a study in May 2010 that projected a 45-fold annual data growth by the year 2020. IDC continues to study the digital universe. In the updated 2011 study, IDC projected that the amount of information created and replicated would surpass 1.80 zettabytes (1.8 trillion gigabytes) by 2015. With the advent of cloud computing, digital delivery of videos and music, blogs, and business data, that projection may turn out to be a bit conservative.
So the collection, transmission, and storage of data continues to explode. Is this what everybody means by big data? Perhaps. Clearly, in our rapidly digitized world, we’ve become more dependent upon technology and the data that drives it. But that still doesn't answer the question, "What is big data?"
I talked to a friend of mine who is an information technology genius. I don't use this term lightly; my friend works with data, coming up with ways to encrypt and decrypt it. I asked him the question, “What is big data?”
His answer is fascinating, but incomplete.
It is more than just a massive collection of data. He said that big data is nothing more than zeros and ones, binary on-off switches. Okay, I knew that. He said that the challenge of working with big data is to understand the context. Okay, it's important to understand what the data means. No, that wasn't quite right either.
My friend asked me how satisfied I typically am with Google searches. I use Google a lot, and I suspect most of you do, too. There are times where I can find what I'm looking for very quickly, like a map. Just type in the street address and Google pretty much nails down a map, as long as the address is in the United States. But how satisfied are we with more complex questions, such as, “How can the net amount of entropy of the universe be massively decreased?” Try searching that in Google. You won't get an answer, but you will get plenty of references to the Isaac Asimov short story, "The Last Question."
Sometimes the effort required to find what you need in a Google search depends on how clever you are at phrasing the search terms. If you've ever searched for photographs or charts in Google, you may share my frustration when you discover that your ability to find the right photograph depends either on how the author of the photograph named the file or on whether they went to the trouble to tag the image. Google doesn't understand context. My friend tells me that they're trying real hard, but they haven't cracked the code yet.
Big data is large piles of data that are difficult to analyze. Big data is data, not information. Big data is a term that perhaps is more hype than reality—a promise, not a panacea.
The easiest excuse for a manager’s inability to make a decision is an apparent lack of data. When managers are not comfortable making a decision, they ask for more data. They claim that there is insufficient data to answer the question or make the decision.
How many times have you used that excuse?
If you think about the use of data processing in business management, the promise is that more data leads to better decisions. To cash in on this promise, business continues to invest in information technology in an effort to collect more data, automate processes, and automate data collection processes—all in an effort to make better decisions.
Some believe that it is an empty promise.
I could be one of those people. The question I always ask is, “Do we have the right data?" This cuts to the issue of quality versus quantity. We can have a pile of data, but if it is inaccurate, it is useless. We can have a little bit of data, a miniscule amount, but it can be perfectly accurate, and therefore unbelievably useful. We can have a little bit of data that is perfectly accurate but doesn't help us answer the question we are asking, and is therefore useless.
Does more data lead to better decisions? Many business publication writers and software vendors say so. The October 2012 Harvard Business Review article, “Big Data: The Management Revolution,” attempts to make the case that big data can radically improve company performance. However, in the April 2012 edition of the HBR, the article, “Good Data Won't Guarantee Good Decisions,” makes the counterpoint. “At this very moment, there’s an odds-on chance that someone in your organization is making a poor decision on the basis of information that was enormously expensive to collect.”
Oh Boston, I think we have a problem.
I opened this article by saying that the idea of big data reminds me of a story. The story is Isaac Asimov's "The Last Question." This story is about the development of computers and their relationship with humanity. In each of the first six scenes, the characters ask their computers the same question: how to reverse the damage to the universe done by human beings. Each time the characters ask their questions, the computer replies that there is insufficient data for a meaningful answer. Eventually the humans and the Galactic computer merge into a single entity. I suggest you invest the 28 minutes it takes to listen to this great reading of the story to hear what the computers’ final answer is to the last question.
Doing so will help set the stage for our future study.