New ISO/IEC 5259 certification for AI data quality

The importance of data quality in AI

AI data quality can be understood as the set of properties (form both technical and business viewpoints) that make data suitable for training, validating, and operating AI models in a reliable, robust, fair, and auditable manner. In AI, poor-quality data leads to poor results: it can create bias, instability, and decisions that are impossible to explain.

Data quality is important for AI because, in practice, data is the input that defines what the model learns, how it generalises, and how it behaves in production. If the input is not of high quality, we cannot expect the output, what the AI returns to us, to be of high quality either. Data quality, therefore:

  • Determines what the model learns (and what it does not learn). A model does not learn ‘reality’; it learns statistical patterns from the data set. If there are errors, omissions, or inconsistencies, the model learns incorrect or incomplete patterns.
  • Avoids bias and unfair decisions. Non-representative data (e.g., certain areas, ages, customer types) causes the model to perform very well for some cases and very poorly for others. In AI, this translates into indirect discrimination and systematically worse results for subgroups.
  • Reduces instability and "capricious" results. Inconsistencies, duplicates, schema changes, outliers, and noise in labels cause sensitive models: small changes in input → large changes in output. That kills confidence and complicates maintenance.
  • Improves actual performance, not just laboratory metrics. Poor-quality data can inflate metrics due to data leakage, poor sampling, or incorrect labelling. Data quality helps ensure that evaluations are honest and that performance is transferable to production.
  • Enables explainability and traceability. If you do not know where each variable comes from, how it was transformed, which version of the dataset was used, and under what rules, you cannot explain or reproduce the model's decisions. This is critical in audits and high-impact systems.
  • Is the basis for compliance and auditing. Privacy, minimisation, purpose, retention, and internal (and regulatory) controls depend on having controlled data: lineage, consent where applicable, label quality, incident management, etc.
  • Increases robustness in production. Even if you train with perfect data, the world changes: data drift and concept drift. If you do not monitor the quality of the input data and the stability of distributions, the model degrades "silently".
  • Reduces costs and iteration cycles. Most of the time in AI projects is spent on data. Improving its quality reduces rework: less debugging, less unnecessary retraining, fewer incidents, and faster decisions on whether a model is viable.

How ISO/IEC 42001 reflects the importance of data quality

The ISO/IEC 42001 standard reflects the importance of data quality very explicitly: it does not treat it as "something desirable", but as an operational requirement/control within the AI Management System (AIMS) that must be able to be planned, executed, evidenced, audited and improved. ISO/IEC 42001:

1) Incorporates data quality as a formal concept of the system.

The standard defines "data quality" as a characteristic linked to data meeting the organisation's requirements for a specific context.

2) Makes data quality a specific control: A.7.4 "Quality of data for AI systems"

In the reference controls catalogue (Annex A), under "Data for AI systems (A.7)", there is a dedicated control: A.7.4 Quality of data for AI systems, which requires defining and documenting quality requirements and ensuring that the data used to develop and operate the AI system meets them.

This is the most direct proof of its importance: data quality is managed as a mandatory/selectable control (depending on scope and SoA) with evidence.

3) Requires traceability so that quality is demonstrable (not "debatable")

Alongside quality, Annex A includes controls such as:

  • A.7.5 Data provenance: record origin and transformations.
  • A.7.6 Data preparation: criteria and methods of preparation.

This reinforces the fact that "quality" is not just about metrics: it is about governing the data lifecycle in order to justify results, reproduce them and audit them.

4) Provides implementation guidance that connects quality with validity, bias, and fitness for purpose.

In the implementation guide (Annex B), section B.7.4 explains why quality impacts the validity of outputs, and calls for defining/measuring/improving the quality of training, validation, testing and production data, also considering the impact of bias on performance and fairness.

5) Integrates data quality into the management system's "PDCA cycle"

Although control A.7.4 is the central point, ISO 42001 complements it with the management approach:

  • Management system requirements that include data governance and lifecycle controls (overview of the standard).
  • Performance evaluation (clause 9): requires monitoring, measuring, analysing and evaluating the AIMS; in practice, this forces the use of KPIs/measures (often for data quality and drift) and periodic reviews/audits.

How the European AI Regulation reflects the importance of data quality

On the other hand, the AI Act (Regulation (EU) 2024/1689) makes data quality a regulatory requirement (not just "good practice") because it understands that many risks to security and fundamental rights arise directly from flawed, biased or unrepresentative data.

1) Explicit requirement for "high-risk" AI systems: Article 10

The most direct reflection is in Article 10 (Data and data governance): it requires that training, validation and test data sets be:

  • relevant to the purpose,
  • sufficiently representative,
  • and, as far as possible, error-free and complete, with statistical properties suitable for the persons/groups affected by the system.

This is precisely what regulatory "importance" means: if your data does not comply, your high-risk system does not comply with the AI Act.

2) It requires "data governance", not just measuring metrics.

Article 10 itself requires data governance and management practices (how data is collected, prepared, recorded, controlled and versioned), including addressing biases, data gaps and adaptation to the actual context of use (geographical, behavioural, functional, etc.).

In other words: the AI Act does not accept "the model already performs well"; it calls for processes and controls to ensure that data is controllable and defensible.

3) It makes it demonstrable: technical documentation and conformity assessment

For high-risk systems, the Regulation requires technical documentation to be prepared before commercialisation/production and to be kept up to date (this is what is then used to demonstrate compliance, including with regard to data).

And that logic connects directly with the market: if you cannot demonstrate (with evidence) that you manage data quality in accordance with requirements, the system should not be allowed to be marketed as compliant.

4) It also affects general-purpose models (GPAI): transparency regarding training and data

For general-purpose models, the AI Act imposes technical documentation obligations (Annex XI) that include information on the training process and the data used, and also requires a "public summary" of the content used for training (with guidelines and templates from the Commission). This reinforces accountability for the model's ‘raw material’, i.e. the data.

The ISO/IEC 5259 standard

Considering the importance of data quality and how important references such as the AI Act or ISO/IEC 42001 address this quality, the ISO/IEC 5259 family of standards is of paramount importance. Its objective is to help define, measure, manage and govern the quality of data used in analytics and machine learning (ML), so that the results are reliable, comparable and auditable.

The ISO/IEC 5259 standard is based on previous data quality standards (e.g. ISO/IEC 25012 and ISO 8000) but applies them to the ML context: measuring/ensuring quality with a language and practice more oriented towards pipelines, labels, evaluation, etc.

The ISO/IEC 5259 standard consists of:

    • 5259-1: Overview, terminology, and examples. This is the “gateway”: it defines the framework, concepts and examples for understanding and relating to the rest of the parts.
    • 5259-2: Data quality measures. This is the most operational part. It defines a quality model and a set of characteristics and metrics, as well as guidelines for reporting data quality in analytics/ML.
    • 5259-3: Data quality management requirements and guidelines. Defines requirements and guidelines for establishing and improving a data quality management system applicable to analytics/ML (similar to a "management system" but focused on data).
    • 5259-4: Data quality process framework. Provides a standardised process framework for managing quality, with a practical focus (e.g. labelling, evaluation and management throughout the lifecycle), applicable to different types of ML (supervised, unsupervised, etc.).
    • 5259-5: Data quality governance framework. Defines how the organisation should manage and monitor data quality: roles, responsibilities, accountability and governance controls to ensure that measures/processes are applied throughout the organisation and its life cycle.

    Focusing on ISO/IEC 5259-2, its scope is very clear: a quality model, a set of metrics, and a mechanism for reporting the quality of data used in analytics/ML tasks (training, validation, testing, and, by practical extension, operation). Furthermore, it is applicable to any organisation that wants to meet data quality objectives in this context.

    ISO/IEC 5259-2 is based on ISO/IEC 25012 and the measurements of ISO/IEC 25024, but adapted to the needs of AI. To this end, it takes as its basis the quality characteristics defined by ISO/IEC 25012 and supplements them with a new set of characteristics and metrics specific to AI, as shown in the following figure (taken from https://iso25000.com/index.php/en/iso-25000-standards/iso-5259).

    At I2SC, we are pioneers in ISO/IEC 5259 certification

    Therefore, ISO/IEC 5259 enables organisations to define quality criteria for their AI data and subsequently demonstrate compliance with these criteria through a set of metrics that ensure confidence in the data used to train, validate and operate AI.

    At I2SC, we are pioneers in ISO/IEC 5259 certification, offering our clients the possibility of certifying the quality of their AI system data, relying on assessments carried out by the only internationally accredited laboratory, AQCLab.

    These AI data quality certifications enable our customers not only to comply with the respective data quality clauses cited in benchmarks such as the AI Act or ISO/IEC 42001, but also to have peace of mind regarding the data used in AI.

    If you are interested in the quality of your data for AI and in certifying it under the ISO/IEC ISO 5259 standard, please do not hesitate to contact us.