I have been reading, with my personal skepticism filter on, about the latest and greatest on “Big Data.” Big Data is defined as the ability to collect a voracious amount of data and using the data to make decisions. A new type of position, data engineer, has been created to oversee this significant increase in volume of data, the physical maintenance of the hardware, and the creation of relevant datasets.
Our ability to collect data has increased logarithmically. Cost of storage has dropped dramatically in the past 10 years. We have more access to varying types of data than ever before. Yet, with all this data, has our decision making ability increased at the same proportion as our ability to store it? I would contend absolutely not.
To start answering the question I would first request you read Caribou Honig’s guest blog post on Forbes. The post’s contention, and rightly so, is that Big Data should not be the focus. We need to focus on “Better Data.” As quality professionals we have espoused this concept for years. If we don’t analyze the data germane to the issue at hand we will not create the appropriate decision or solve the problem that is pertinent to the current reality state.
So how does one create Better Data? The days of dumping the entire database into a spreadsheet to create that magical pivot table are over. Here is my strategy to deal with huge databases.
· Understand the data elements that are in the database
· Determine the data elements that you need to evaluate from the problem statement or business objective
· Create a list of expectations as to what you need from data.
· Validate the data elements to the process from which the data is coming from
· Communicate the expectations to the data miner, database guru
· Test drive the dataset using the expectations list
You will notice that this type of exercise requires some different skills and data that are often overlooked. First, you have to know what your process produces and how the data represents the process. I have seen way too many processes being managed by data that has no connection to process inputs or outputs. Next, you have to have to be somewhat skilled in segmentation. Segmentation is not just splitting the data; there has to be logic to the segments. The ideal collaboration for data segmenting is the process subject matter expert, the database guru, and the analyst. Data segmentation is crucial in that it can provide atypical looks that may provide different perspectives on problems or decisions.
Another type of knowledge is statistical intent. The typical statistical knowledge of a data analyst is usually in the realm of that first college level stats class. Data analysts forget the power of visualization and do not understand that data dumps are often static in nature which means that unless you understand the type of data, although you have a lot of it, you may only end up with a single data point. Meaning, you are making assertions from only one perspective.
I do see one positive application of Big Data: greater ease of statistical modeling. To me, this is the one area not readily utilized by the Quality profession. Having the data gives us greater ease in modeling process behavior, if we only use it. Statistical modeling is very powerful as a way to pilot changes and predict future performance.
The Big Data drive opens up some huge opportunities for the Quality professional. It is a road with potholes. But if we trust our Body of Knowledge, use the tools appropriately, we can increase the quality of business decision making. Just make sure you do your due diligence.Until next time!