Fine-tune an open source LLM on your own data

Fine-tuning lets you bring your own data to improve the performance of the LLM for your own specific use case. Airtrain will allow you to upload a dataset, execute a fine-tune, and then export the resulting model so you can use it for inference in your own environment. The general steps you need to take are:

  • Determine whether fine tuning is a good fit for your use case
  • Prepare a dataset to tune on
  • Tune a model
  • Evaluate the model
  • Export the model
  • Serve the model

When and why to fine tune

Fine-tuning is an extra training stage that happens after a foundation model has been trained. Since training is capable of "baking in" knowledge to a model during the foundational training stages, it is a common belief that fine-tuning can also be used to "teach" a model new knowledge (ex: teaching it the API of a software library by tuning it on the library's documentation). However, in general fine-tuning is less well suited to this kind of model improvement than it is to influencing the general form or structure of the model's generated inferences. For example, if you want your model to always produce json that follows a particular structure, or to always write in a Shakespearean tone, then fine tuning might be a good fit for you. A good rule of thumb is to start customizing the behavior of a model by adjusting its system prompt; then if you find it is not consistently following instructions after prompt engineering, move on to fine-tuning.

Preparing your tuning dataset

Your tuning dataset should be a jsonl file where the schema of the json in each row is the same. At least one or more of the columns should be intended for usage to construct the prompt for the tuning, and one or more of the columns should be intended for usage to construct the desired response. For example, suppose we wanted to have a model that told us about the weather and then made a brief commentary on that weather. We might prepare a dataset like this, with the "weather" and "high" fields being used for generating the model prompt, and possibly all the columns being used for an example of how we'd like the model to respond.

{"weather": "cloudy", "high": "10", "commentary": "It'll be darker than a poem by Edgar Allan Poe!"}
{"weather": "sunny", "high": "27", "commentary": "You'd better grab your shades!"}
{"weather": "rainy", "high": "12", "commentary": "It's probably a good day to cozy up by the fire."}

Submit a tuning job

In the Airtrain UI, click New Task and then select Fine-tune Models. Then, follow these steps:

  • Give your run a memorable name, so you can find it in your jobs list later (ex: "My weather tuning")
  • Upload your dataset from the previous step. 90% of the rows from this dataset will be used for training, while the remaining 10% will be used to automatically do an evaluation after the training is done.
  • Choose a name for your model variant. It should be descriptive, and composed only of alphanumeric characters and the '-' character (ex: "weather-summarizer"). If you reuse an existing variant slug, the tuned model that's produced may have a -N appended to the end, to indicate the Nth variant with that name.
  • Select the number of epochs. It is generally preferable to have a dataset large enough to achieve your desired performance with only one epoch of training, to avoid over-fitting to your particular dataset.
  • Specify a prompt template. You can use {{field_name}}style variables to refer to fields that should be populated from the contents of your uploaded dataset (ex: Tell the user the weather is {{weather}} with a high of {{high}}. Then make a playful comment about the weather.). Read more about prompt templates here .
  • Specify a response template. This template will be used to populate what a desirable response from the model should look like for the given prompt (ex: The weather today will be {{weather}} with a high of {{high}} Celsius. {{commentary}}). It is not uncommon for this template to only consist of a single template variable. If, for example, we only wanted the model to generate commentary on the weather, we might have this field be {{commentary}}. Read more about prompt templates here.
  • Configure any evaluation metrics. This should be done as described in the Batch evaluation docs.
  • Double-check your configuration. Make sure your prompt and response templates are as desired.
  • Click Start Run. This will begin a tuning which uses Mistral 7b Instruct v0.2 as its base.

When your fine-tuning job is complete, you'll automatically receive an email with a link to the job results. The results contain a link to export your model as well as batch evaluation results showing how it performed (after tuning) on the metrics you specified.


Full chat support

You may notice that Airtrain's model configuration is structured for a single prompt/response pair, rather than for a full chat/conversation involving multiple turns for the user and assistant. Support for full chats is on our roadmap--if you need support please reach out on our Slack .

Evaluate the model

Before you deploy your model for usage on real data, it's a good idea to make sure it performs well enough for your use case. You can use any models you've trained in an Airtrain Batch evaluation, defining custom metrics and even comparing to other models or variants. The models you have fine tuned will show up in the "Your Mistral fine-tunings" card on the "Model selection" column. The variant name you chose when tuning your model can be used to select it.

Export the model

Once you've decided to use a particular fine-tuning, go to the job that produced the mode (ex: by following the link in your email, or looking at the Jobs page showing your past jobs). Then, select the "Models" tab for the job.

Click on the Download link to download a tar file containing your model and its weights. After extracting the tar, you can upload the model to Hugging Face if you like, or simply upload the files to any cloud storage of your choosing.

Serve the model

The weights you've exported from Airtrain are compatible with the Hugging Face ecosystem, and can be served via any inference server capable of serving Mistral 7b from Hugging Face hub. Some popular options for inference server include vLLM and Hugging Face's own text-generation-inference server.