A long time ago, when a manager I liked very much said “we would like to train you on Data Quality”, I said “I would be very happy”. But then a very good question came to my mind: “What the Data Quality is?”
Over the years I have seen that the importance of this issue has not yet reached full maturity, even in very large corporate firms, and awareness has only recently begun to emerge, despite so many problems. Fortunately, I was relieved.
And I will try to answer this question after many years of experience and many projects that I have involved in the field of data quality.
Well, what is Data Quality?
If I answer this question by underlining some words and then answering them in detail;
We can define the Data Quality as an entirety of Processes, which are carried out by help of analyses performed on data within the organization in line with business requirements of relevant units and departments different metrics and dimensions identified as a conclusion of these analyses.
I know it was a little bit complicated definition, but as I mentioned, if we go into more detail;
Business requirements: It is generally known that IT was responsible for making and managing relevant decisions, particularly in terms of data quality. However, all decisions are made within the organization on data should be the responsibility of the business and ensuring correct execution of all process should be responsibility of IT. And at this point, understanding the needs of the relevant business units is of significant importance.
Analysis: Perhaps the most important part of the data quality, as I mentioned above, is the analysis studies that should be performed on the data once the business needs are understood correctly and clearly. At this point, the problems on the data should be determined with the help of many different analyzes (frequency, pattern, statistical, ad-hoc analysis etc.) to be made on the data.
Metrics and Dimensions: The analyzes carried out allow the analysis of the data in dimensions with the help of different metrics.
Data Quality Metrics
Processes: We can include any process to improve the quality of data in this definition. For example, from profiling that ı have mentioned in analysis stage to standardizing parsing, cleaning or enriching the data constitute the process itself.
In line with the definition I have made above, we can add the following items to emphasize a few things and give some basic information:
- Data Quality is the PROBLEM of every organization
- Data Quality is not only a PRODUCT or a project but also a PROCESS
- The final decision on Data Quality belongs to Business Units
- Data Quality makes sense with a healthy Data Integration infrastructure
Effects of Data Quality
So, what are the effects of data quality from the perspective of organizations?
Data Quality Methodology
And as the last title of the article, if we talk about the project steps implemented and monitored in data quality:
- Business requirements: As I mentioned at the beginning of the article, it is important that relevant units and departments within the organization to understand the data quality problems encountered and to clearly understand the purpose and scope of the targeted project.
- Data Discovery and Analysis: Conducting of detection/discovery of data quality problems in accordance with the analyzes performed on the data
- Metric Identification: Determining the goals to be reached as a consequence and defining the metrics I mentioned in this direction.
- Quality Rules: Improvement of data quality processes as I mentioned in the definition of data quality.
- Exceptions: Management of the exceptions that may emerge in the studies performed and subsequent processes.
- Following up and Reporting: Following-up of the processes through the relevant reports in order to control all the works.
All of the above-mentioned steps are not a one-off process, but an ongoing process and continuous monitoring and management are of great importance.
Let’s end this article here and wish to meet you in the next article.