MDE in the era of Generative AI

Ahmed ALAOUI MDAGHRI1,2, Meriem OUEDERNI2, Lotfi CHAARI2
1: [Nantes University], 2: [IRIT]
YouTube HF Space Github

Acknowledgment

This work is supported and funded by IBCO-CIMI-CNRS research project (call 2021-2024).

Description

We introduce LLM4MDE: a novel approach that combines in context learning and iterative prompting for program verification to ensure programs written in domain-specific languages are syntactically valid and incorporate environment constraints.

Main System Design

  1. Specific Prompt:

    Our solution starts with an input prompt holding two parameters:

    • The Natural Language Description (NLD) of a future DSL including user requirements and semantics' constraints
    • Document type definition on a specific modelling language (e.g., Ecore definition file)

    We combine multiple prompting techniques: grammar prompting, Chain-of-Thought (COT), and tool use, to increase prompting quality.

  2. Output Generation:

    The LLM processes the input prompt to generate the serialized result (in XMI format) by LLM inference. For Ecore meta-model as target output, the inference:

    • Parses Ecore language markers
    • Extracts relevant concepts, entities, attributes, and relationships
    • Builds structural model adhering to Ecore syntax
  3. Model Validation:

    The process involves:

    • Parsing using domain grammar to check for syntactic errors
    • Human intervention for semantics checking if no syntactic errors are found
    • Iterative prompting with LLM agents to fix syntactic errors if present
  4. Database Storage for Use Cases:

    Validated outputs are stored in a database for further analysis and fine-tuning of the LLM, facilitating iterative refinement based on feedback and real-world scenarios.

  5. Ambiguity Resolution:

    For unresolved ambiguities or validation failures, we re-prompt the LLM with:

    • The occurred errors
    • Possible additional description from the user
    • Access to external tools (e.g., API search calls, documentation) to enhance understanding
  6. Fine-tuning and Model Improvements:

    Validated results that were stored are used to fine-tune our LLMs after gathering sufficient validated models.

Distribution of one use case resolution

The following table depicts the distribution of use case resolutions for each model. These statistics were collected based on the resolution rate of one use case (SimplePDL use case) over 40 iterations by our top-performing models.

Model Context provided Correct output N
GPT-4o Zero-Shot 0 40
Our prompt 2 40
Iterative Prompting 32 40
meta-llama/Meta-Llama-3-70B-Instruct Zero-Shot 0 40
Our Prompt 21 40
Iterative Prompting 30 40
mistralai/Mixtral-8x7B-Instruct-v0.1 Zero-Shot 0 40
Our Prompt 8 40
Iterative Prompting 18 40

Fine-Tuning Job

To improve our model's performance, we conducted a fine-tuning process:

Demo Application