This work is supported and funded by IBCO-CIMI-CNRS research project (call 2021-2024).
We introduce LLM4MDE: a novel approach that combines in context learning and iterative prompting for program verification to ensure programs written in domain-specific languages are syntactically valid and incorporate environment constraints.
Our solution starts with an input prompt holding two parameters:
We combine multiple prompting techniques: grammar prompting, Chain-of-Thought (COT), and tool use, to increase prompting quality.
The LLM processes the input prompt to generate the serialized result (in XMI format) by LLM inference. For Ecore meta-model as target output, the inference:
The process involves:
Validated outputs are stored in a database for further analysis and fine-tuning of the LLM, facilitating iterative refinement based on feedback and real-world scenarios.
For unresolved ambiguities or validation failures, we re-prompt the LLM with:
Validated results that were stored are used to fine-tune our LLMs after gathering sufficient validated models.
The following table depicts the distribution of use case resolutions for each model. These statistics were collected based on the resolution rate of one use case (SimplePDL use case) over 40 iterations by our top-performing models.
Model | Context provided | Correct output | N |
---|---|---|---|
GPT-4o | Zero-Shot | 0 | 40 |
Our prompt | 2 | 40 | |
Iterative Prompting | 32 | 40 | |
meta-llama/Meta-Llama-3-70B-Instruct | Zero-Shot | 0 | 40 |
Our Prompt | 21 | 40 | |
Iterative Prompting | 30 | 40 | |
mistralai/Mixtral-8x7B-Instruct-v0.1 | Zero-Shot | 0 | 40 |
Our Prompt | 8 | 40 | |
Iterative Prompting | 18 | 40 |
To improve our model's performance, we conducted a fine-tuning process:
Model | Context | Correct Output | Similarity |
---|---|---|---|
GPT-4o | Zero-Shot | 0 | 0.414 |
Our prompt | 0 | 0.6183 | |
Iterative Prompting | 0.18 | 0.7041 | |
GPT-3.5-Turbo | Zero-Shot | 0 | 0.0589 |
Our Prompt | 0.02 | 0.6825 | |
Iterative Prompting | 0.02 | 0.5845 | |
Fine-tuned GPT-3.5-Turbo | Zero-Shot | 0.16 | 0.7807 |