Better AI for code development

Changed on 28/05/2025

In no time at all, Generative Artificial Intelligence has made its way into the hands of developers, who now routinely use it to design their software. But despite the lightning progress and astounding performance, problems remain when it comes to writing programs using this technology. A dozen or so Inria teams and several partners are working together to solve them, to improve both the productivity and reliability of these new-style development assistants. This is the aim of the Inria LLM4Code Challenge.

code - programmation - IA — © Inria / Photo M. Génon

With the advent of ChatGPT, the world discovered how “generative” Artificial Intelligence could help write e-mails, novels, translations, reports and essays. Developers were also quick to seize on this technology to write their programs more easily. And it wasn't long before specialized assistants appeared to make their work even easier.

Copilot is an emblematic example. Designed with the help of OpenAI, the company behind ChatGPT, this AI is inserted as a simple extension into the integrated development environment (IDE) of GitHub, the world's largest code production platform.

"More and more developers are using the tool to do programming and program design-related tasks: code comprehension, bug fixing, software migration and so on. So, clearly, something is happening. And in a way, it's an extension of what already exists. When you use a word processor, there's a spell-checker to flag up any mistakes. When we write code, there's an IDE that uses different colors to give the text legibility, that points out our mistakes, that makes suggestions, etc."

Verbatim

Now, generative AI can provide even more interesting support, because it has the ability to produce code or explanations that are often relevant and difficult to generate without this kind of technology. This would hardly be possible with conventional approaches.

Mathieu Acher

Associate Professor and researcher in the DiverSE team, co-leader of LLM4Code

LLM4Code is an Inria challenge^[1] that brings together for 4 years the Argo, Cash, Deducteam, DiverSE ^[2], Evref, Flowers, Mnemosyne, Picube, and Spirals research teams, as well as the Inria Experimentation and Development Department (SED), the Software Heritage code archive, the Bordeaux Computer Research Laboratory (LaBRI) and Sopra Steria.

AI for writing formal proofs

One of LLM4Code's objectives is to offer a software brick for Rocq Prover (formerly Coq). Initiated by Inria some forty years ago, this tool is one of the world's leading proof assistants used for formal verification of mission-critical software (health, transport, energy, etc.). "Writing proofs in Rocq is intellectually difficult because it's mathematics. You have to use the right primitives, the right tactics, and so on. In practice, to write these proofs, users rely on an IDE like VS Code, for example. It's an interesting support, but we think we can do better. An LLM could suggest completions, Rocq bits to gain productivity and go faster so as to concentrate on the essentials. We'd like to be able to provide this support, which would then be integrated into the IDE."

The Inria Challenge is also interested in Lean, a Rocq equivalent developed in the American academic community. "Often, programs written in the Rocq language are not available in Lean, and vice versa. With the help of AI, we'd like to translate these programs in both directions so that they're accessible to everyone. And speaking of Lean, it's interesting to note in passing that a mathematician like Terence Tao, 2006 Fiels Medal winner, is already experimenting with AI in conjunction with this language to prove theorems. He uses it to obtain suggestions, for example."

AI, software engineering and formal methods

The scientists belong to three fields: AI, software engineering and formal methods. “We believe that the combination of these expertises will enable us to offer interesting contributions.”

Artificial Intelligence used for program development is based on what are known as large language models. They are also known as LLMs. Containing thousands of parameters, these immense neural networks are trained on massive data sets. In this case, millions of lines of code from previous software programs. Initial training requires gigantic computing resources.

A very fragile technology

"Generative AI has obvious potential. Some of the demonstrations we've seen are mind-blowing. We manage to build software with very few interactions. But... there are also cases where LLMs turn out to be totally wrong ! And that's where it gets really annoying, even for fairly standard applications that aren't necessarily mission-critical software. The user can end up with security bugs, functional bugs that cause the system to crash or consume far too many resources. So, it's a very fragile technology at the moment, and in many cases, we can't just consider LLM as a magical black box that will produce the code."

If reliability isn't always there, the same goes for productivity. "There's also a phenomenon where the LLM does 90% of the work. But the other 10% it doesn't do. And those are very, very hard to achieve. Even more so when the developer has to understand the 90% that was generated by something that didn't come from him."

Updates and migrations

The developer's job not only involves initial design, but also maintenance. Because today's software is constantly evolving. With its advantages and disadvantages... "We've all had the experience of an update gone wrong. The application becomes slower. It doesn't work as well. Or it doesn't work at all. With LLMs, the hope is to be able to help programmers update more reliably and faster."

The researchers distinguish three types of evolution. "First, the one where the user simply changes version. Then there's the one where he reconfigures his software. He goes to the preferences menu. He changes something. And then everything falls apart... We saw this recently with the paralysis of certain airports. Finally, there's the problem of software programmed with aging languages. They're everywhere. They make the planet go round. But we can't migrate these applications reliably to new technologies. It has to be done in production. There are millions of lines of code. We can't translate line by line, because that would be incoherent. We don't have the tools to assist developers. And worst of all, we have less and less expertise in these languages as specialists retire." Hence the idea of using LLMs for modernization.

Software Heritage

Software Heritage is one of the partners of the Inria LLM4Code challenge

One of the main ways of improving LLM performance is to feed it with relevant data. To this end, researchers will be able to draw on Software Heritage. Launched by Inria and supported by Unesco, this open archive collects publicly available software in source code form. It hosts 19 billion unique files from over 300 million projects.

Another research project, Code Commons, aims to provide a new data infrastructure, based on Software Heritage, to train or specialize LLMs in software code generation. This project is set to position France in the field of AI and software, and could feed into LLM4Code research.

In addition to existing data, the LLM4code Challenge will also produce synthetic data to enhance LLM capability. "To do this, we sometimes use tools that don't come from Artificial Intelligence. For example, we can translate one language into another, using compilers that have the advantage of being reliable and deterministic. In this way, we can compose multiple combinations of tools and data (execution traces, error messages, etc.) which we then give to the LLM to make it better. The important thing is to enrich it with a diversity of sources."

Integrate with existing tools

Another method of specialization: reinforcement using user preferences. "I train the LLM by giving him examples. Here's my question. Here's the answer I want. Here's how to format it. This technique has been used by ChatGPT with the recruitment of many annotators."

The research results will essentially take the form of software bricks and extensions that the researchers then hope to be able to integrate into existing IDEs such as VS Code, for example.

Do all these advances augur the demise of the developer? "No, assures Mathieu Acher, this kind of talk is the stuff of fantasy. In the current state of knowledge, we're really not in that situation, but rather aiming to augment developers with AI."

Find out more about the LLM4Code project with Mathieu Acher (in french)

Transcription of audio interview with Mathieu Acher_DiverSE_Inria Centre at Rennes University_VEN pdf | 228.02 KB

Further information (in french, automatic English subtitles available)

^[1] ^{An Inria Challenge is a cross-disciplinary research project involving several of the institute's teams working on large-scale themes.}

^[2]^{Specializing in software engineering, DiverSE is an Inria, Insa Rennes, CNRS and Université de Rennes team, working in conjunction with Irisa.}

Find out more about AI-related scientific news

Cybersécurité

Sécuriser l’intelligence artificielle

Artificial intelligence

Could we see the collapse of generative AI?

Startup Studio

Position Paper: five challenges for more environmentally-friendly artificial intelligence