Components for a data validation strategy
May 12, 2025
A data validation strategy combines economic goals with the appropriate technical means. This takes place against the background of a process and IT landscape that favors or inhibits certain procedures. Objectives, such as avoiding production downtime due to faulty documents, are relatively easy to determine. Selecting the appropriate means, on the other hand, is often more difficult, as these have both organizational and technical implications. To make it easier for you to form a strategy, we present the characteristics of various approaches to data validation.
Checking single documents or aggregated data
Let's start with the terms: A single document could be an
eInvoice, for example. Aggregated data, on the other hand, is a structure in which a lot of information or documents are stored together. This could be data in a data warehouse. Factors for the comparison are, on the one hand, the response time between sending the data and receiving the validation report. On the other hand, you should consider the effort required to identify patterns in the error images.

In terms of response time, single document checks are ahead of the game. If the data sender receives feedback within one to two minutes, in most cases he can still correct his document so that the supply chain is not impaired. One prerequisite is that there is a defined
correction process that he can use as a guide. On the other hand, the data sender must also have personnel ready to take immediate action in the event of an error. When checking aggregated data, on the other hand, no feedback is sent during the time in which the documents are collected for subsequent checking. This can take too long in just-in-time or just-in-sequence processes.
When displaying patterns in the error images, however, checks on aggregated data have an advantage. If thousands of files are checked individually and several hundred documents contain errors, a correspondingly large number of validation reports are generated. To ensure that these do not overwhelm their recipients, an aggregation mechanism must be implemented to display the results. This often requires a separate project and therefore more effort compared to validating already aggregated data.
Single document checks therefore have their strengths where multiple potential sources of error need to be checked in time-critical processes. Checks of aggregated data, on the other hand, are particularly recommended where you only want to check a specific, non-time-critical aspect of a document type across all data senders. For example, which partners particularly frequently do not provide contact details for inquiries in order confirmations.
Here, too, there are a number of comparison factors that you can formulate in the form of key questions when developing your data validation strategy:
- Should the causes of errors be tackled or is it enough to deal with their symptoms?
- Are the people receiving the feedback familiar with the data format in which the errors are presented?
- How advantageous is it to only need a single test interface per document type?
- How problematic is it if individual documents have been corrected by other people or applications before the time of validation, so that they can no longer be recognized as originally incorrect?
- At what point in the data stream do you have all the information you need for validation?
The advantage of checking documents in the original format is that you can send the validation report to the original data sender. This allows you to tackle the causes of errors and not their symptoms. Data senders will find it easier to rectify particularly complex error patterns if they receive a comparison of actual and target values based on their own document. Conversely, data senders will find it more difficult to correct errors if they are explained using the target format. However, if the errors are corrected on your side anyway, it can make sense to check documents in the target format straight away. This is particularly interesting if you have no influence on the data sender and therefore cannot expect any improvements on their side. Particularly with common target formats, such as CSV, JSON or XML, you also reduce the training time for new employees and thus increase the efficiency of the correction processes.
Test interfaces are not static constructs. With new requirements from the specialist departments, new checks become necessary or existing ones have to be adapted. As validations in the target format usually manage with significantly fewer check interfaces than checks for different source formats, you have the advantage that such adaptations require less effort and are less prone to errors. In addition, you can more easily identify errors that a data sender makes across different data formats. How serious this factor is depends on the number of source formats.

However, the conversion of a document can also fail if there are serious structural errors. If this is the case, it will not be available in this form at a later stage and therefore cannot be checked. If it is corrected manually in the course of the business process, in the worst case it may not even be noticed that there was an error in the original data. You can avoid this problem if you check documents in their original format. If, on the other hand, such problems hardly ever occur, this factor is negligible. In some check scenarios, data relevant to the check is only added after the conversion process and is integrated into the target format. If this is the case, a check in the original format is not possible.
You should prefer checks in the original format if the original data senders are to make corrections in the event of errors and correct all errors themselves with long-term effect. As a rule, these will be external data senders. Validating a standardized target format, on the other hand, is attractive if either a party other than the data sender makes corrections or the format is also known to the data sender. In such cases, you have the efficiency advantage of fewer check interfaces, but you can still assume that the data sender is able to understand the feedback. This is especially true for data streams within a company. The other factors can tip the scales in favor of one of the two approaches, but should not usually carry any decisive weight.
The goal determines the path
As you can see, all approaches offer advantages and disadvantages. The ideal validation strategy is the one that achieves your goals to the greatest extent with the least effort. To determine it, you should first get a clear picture: What goals do you want to achieve? Which means are suitable for this? And what circumstances will influence the result? But you don't have to go down this path alone. In data quality projects, we are happy to advise you on developing the right validation strategy.