Why are new complex strategies necessary to certify, audit, and evaluate AI systems?

Artificial intelligence systems in general are quite complex from a cognitive point of view, meaning that expectations regarding their performance and the mechanisms that enable inferences to be made in the sense described above are not as precise and objectifiable as in systems that could be called conventional software.

For example, the expected error rate in a payroll management system should be zero because it makes no sense for the system to make any errors once it has been evaluated. On the other hand, a decision support system for the diagnosis and treatment of a specific group of diseases could have a success rate of 70% and still exceed the success rate of many reputable hospital teams.

For these and other reasons, it is necessary to design new complex strategies for the certification of this type of system, and this is where I2SC plays a key role thanks to its many years of experience in software certification and artificial intelligence.

What should these strategies focus on?

As we have seen, intelligent systems have specific characteristics that, in most cases, do not allow precise metrics to be used to evaluate their different components.

In the inputs that feed these systems, there are already some techniques for studying structured, numerical, and normalised data sets (which is the final input to all machine learning programmes or libraries). However, these techniques are usually only capable of measuring statistical characteristics such as dispersion or similar, but it is not easy, for example, to evaluate the potential of this data to obtain results in a specific problem. The matter becomes even more complicated when it is necessary to evaluate the potential of the human knowledge available to address that problem.

En cuanto a los mecanismos de inferencia, bien sean algoritmos de aprendizaje automático o mecanismos de inferencia basados en lógica, la principal dificultad reside en medir la capacidad de convergencia, es decir, la capacidad de generar patrones en unas condiciones determinadas. Y en los mecanismos de emulación del razonamiento humano mediante estructuras formales, la principal dificultad reside en la capacidad de representar y manipular un conocimiento tan complejo como el que manejamos los humanos.

Finally, the capacity of the developed system to achieve the initially proposed objectives must be studied: how well it predicts, how good the content it generates is, how good its recommendations are, or how well it helps in the decision for which it was designed. This is what is usually called “validation” of the system, but in the case of intelligent systems it is much more complex due to the vague expectations we have already discussed.

This is where I2SC once again plays a key role, thanks to the experience it has accumulated over many years in software certification and artificial intelligence.

en_GBEN