The Erasmus+ Agency chooses efficiency with data analysis through AI

In December 2024, the Erasmus+ Agency located in Bordeaux integrates an AI solution into its information system. This gives it the ability to massively pre-analyze a portion of its administrative documents.

Erasmus+ faces the challenge of data extraction

The Erasmus+ program is the European program for education, training, youth, and sport. But it is not only for students. It is open to all audiences in the fields of education and training, both formal and non-formal. Launched in 1987 under the auspices of the European Commission, Erasmus+ builds connections far beyond European borders. It links the 27 EU member states, 6 associated non-EU countries, and an extensive network of partners worldwide.

Applying for an Erasmus+ project requires submitting a set of administrative documents that allow organizations to identify themselves to the agency. The format of the documents can vary: they may be PDF files or images of scanned documents. The documents can be filled out digitally or by hand. While the variety of cases does not significantly impact human reading and data extraction, it makes the computerization of the process more difficult.

The IT department of the Erasmus+ Agency wanted to explore the possibility of analyzing these documents and extracting information using AI. If successful, this automation would significantly reduce the staff’s time by facilitating the comparison of the extracted data with legal information and performing most of the tedious task of transcribing the processed information.

Once this initial use case was defined, the agency created a demonstrator to verify the technical feasibility of a potential solution. With the demonstrator proving promising, the question of realizing the idea and putting it into production arose. The goal of this production deployment was to make the results available to the agency’s staff.

For several months, the Agency has been working on identifying time-consuming tasks in our management processes. The identification and processing of supporting documents provided by applicants seemed to us the ideal area to explore for implementing AI.

Christophe TREZEGUET, Director of Information Systems at Erasmus+

Initiation of a Collaboration

Erasmus+ chose to work with Onepoint due to its expertise in the field of automated data processing using AI. Our skills and adaptability reassured the client that we could meet their needs. By choosing to understand all the possibilities offered by machine learning, this collaboration was made possible.

From Data Retrieval to Learning: A Three-Step Mission

Acquiring data is one thing. Defining the entire set of processes that will generate value is another.

To achieve this goal, the mission went through three major phases. Initially, Erasmus+ had developed a theoretical demonstrator. This was a prototype designed to show how an AI-based tool could efficiently extract data. The goal was twofold: to demonstrate the feasibility of AI-driven extraction and to initiate a base for an operational tool.

The work required the involvement of a data scientist for twenty days at the end of 2024.

First Step: Studying the Existing

The first phase of the mission involved an audit to study the first demonstrator produced by Erasmus+. Beyond the technical analysis, the aim was to validate the methods implemented for data processing.

This first phase raised several questions: Are the machine learning models used robust enough for production deployment? Are they explainable, i.e., can we understand how they make decisions? If yes, to what extent? Can the processing performed by these models be improved? Will the processing be fast enough once scaled for production?

Second Step: Consolidating AI-related Choices

Exploring these questions highlighted that the choice of models needed to be consolidated. The problem was split into two sub-problems: identifying the document type and then determining the information to extract from it.

Seeking a Balance Between Efficiency and Explainability

For the first classification phase, the demonstrator used a type of model called a neural network. This model currently produces some of the best results in the field. However, it also has the disadvantage of being difficult to explain and hard to maintain over time. Additionally, the data processing time can be longer if the model is not properly sized.

Knowing this, instead of relying on a single complex model, we chose to define a set of specialized models. These models are lighter, require less data for training, and are therefore easier to implement and maintain. The advantage of this structure is that each model focuses on a specific task. This feature facilitates both the explainability of the system and its potential updates. Each model was selected to find the best compromise that addressed the previously posed questions.

Once the first phase was addressed, we could study the question of practical information extraction. The task of converting an image into textual content is called optical character recognition.

This task proved to be particularly complex in our case. Indeed, an analysis system would face two major challenges. The first: identifying the areas of the image that actually contain text.

Using an External Tool as the Best Compromise

The second: adapting to the fact that the writing might be handwritten or digital. Given the limited time frame of the mission, we decided to delegate this task to a third-party tool providing this analysis service.

The major advantage of this solution is that it offers a much better cost/quality ratio than a solution produced solely in-house. Another advantage of the analysis-as-a-service model is that it delegates the maintenance of the tools to the company providing the service.

Third Step: Moving from Theory to Practice

Once these points were resolved, the next step was to begin development for deployment and use of the application by Erasmus+ staff. The deployment method was chosen to integrate as seamlessly as possible into the agency’s infrastructure.

This classification and information extraction system, which we call ValidIA (Validation by AI), was designed to be used as an API, so that we could query it from our management tool without having to integrate a heterogeneous technology into the existing system.

Christophe TREZEGUET, Director of Information Systems at Erasmus+

AI as a Toolbox

At the end of the mission, we successfully deployed the application. The agency’s staff was then able to access it.

The development and initial tests revealed needs that were not covered by the original demonstrator.

We added these additional features as we went along. The goal was to equip users as effectively as possible within the timeframe of the mission.

The application scans several thousand documents each night. This daily process significantly reduces the time-consuming task of extracting and manually verifying the entered data.

A second benefit was optimizing the user experience by performing most of the model calculations outside of working hours.

Thinking Beyond the Existing

Building on this success, Erasmus+ France is already exploring other administrative tasks where AI could be applied. At Onepoint, we see immense potential in these technologies, which we are ready to harness for our partners.