Big Data is the holy grail of analytics today. With it, organizations can gain predictive analytics and user behavior analytics, and discover patterns, trends, and associations that were once impossible to gather. Big Data leads to better decision making, which leads to improved operational efficiency, reduced risk, and bottom-line cost savings. But Big Data can be difficult to attain. A lot of work must be done before organizations can realize the benefits of Big Data. Before you can get Big Data, you need clean data.
Issues such as duplicate data, incorrect numbers, missing characters, missing data fields, data associated with assets no longer in service, and multiple numbers associated with one asset can corrupt data, making it inconsistent and inaccurate. Data cleansing, reconciliation and Master Data Management (MDM) are critical to achieving clean data, but can be seen as time consuming and costly endeavors with little short term results.
Review your Data
So how do organizations begin gathering and cleaning data along the path to Big Data? In an article about clean data, Patrick Gray, a leading technology expert and consultant, suggests to, “Start with the problems you expect Big Data to solve, the benefits of gaining the rapid responses and refinements characteristic of Big Data, and then compare the costs of repeatedly performing cleaning versus biting the bullet and doing it right the first time.”
As Gray suggests, the first step is to identify what data you have and what you need to achieve your Big Data goals. This includes MDM activities such as data reconciliation or a complete audit of inventory. It typically involves examining current records within a database and confirming the information held in the legacy database is correct.
This may be a lengthy process, but one that will reap benefits in the end. But don’t bite off more than you can chew. Gray states that, “some early, small successes are far better than getting caught in the weeds of trying to solve all your data problems at once and never actually delivering any value.”
Keeping Data Clean
Once your legacy data is clean, how do you ensure it stays clean, and that new data is clean going forward? Again, it goes back to understanding your overall goals for Big Data analytics.
Make sure the data you’re collecting is what you need for analysis and you aren’t capturing irrelevant data based on past practices. This could mean simplifying the data you’re gathering, such as removing unnecessary fields. More isn’t always better. Adding fields and functions into software can reduce the timely analysis you’re looking to achieve.
Develop Data Gathering Policies
From what data to gather to the proper method of collecting data, consistency is key to data quality. Agree on the MDM data fields that are most important to your data analysis. This includes part numbers, model numbers, serial numbers, etc. Then use consistent tools or methods to collect that data. Automatic data capture systems, such as bar code labels and scanners, are the most reliable methods to capture data. These methods leave little room for error like manual data collection where it’s easy to miss fields or transpose numbers.
Reviewing data allows you to identify common errors or pinpoint the areas where errors typically occur. Investigate and correct all data errors before they are entered into the system and develop your own policies and best practices to ensure the errors do not continue.
While many organizations have gotten by with messy, incomplete or incorrect data in the past, the push toward Big Data highlights the prerequisite for clean data first. If your organization is shifting toward the need for instantaneous data analysis, clean data is the fundamental first step. For assistance getting or maintaining clean data, contact Camcode.