Fine-tuning

Fine-tune an open source LLM on your own data

Fine-tuning lets you bring your own data to improve the performance of the LLM for your own specific use case. Airtrain will allow you to upload a dataset, execute a fine-tune, and then export the resulting model so you can use it for inference in your own environment. The general steps you need to take are:

  • Determine whether fine tuning is a good fit for your use case
  • Prepare a dataset to tune on
  • Tune a model
  • Evaluate the model
  • Export the model
  • Serve the model

When and why to fine tune

Fine-tuning is an extra training stage that happens after a foundation model has been trained. Since training is capable of "baking in" knowledge to a model during the foundational training stages, it is a common belief that fine-tuning can also be used to "teach" a model new knowledge (ex: teaching it the API of a software library by tuning it on the library's documentation). However, in general fine-tuning is less well suited to this kind of model improvement than it is to influencing the general form or structure of the model's generated inferences. For example, if you want your model to always produce json that follows a particular structure, or to always write in a Shakespearean tone, then fine tuning might be a good fit for you. A good rule of thumb is to start customizing the behavior of a model by adjusting its system prompt; then if you find it is not consistently following instructions after prompt engineering, move on to fine-tuning.

Preparing your tuning dataset

Your tuning dataset should be a jsonl file where the schema of the json in each row is the same. At least one or more of the columns should be intended for usage to construct the prompt for the tuning, and one or more of the columns should be intended for usage to construct the desired response. For example, suppose we wanted to have a model that told us about the weather and then made a brief commentary on that weather. We might prepare a dataset like this, with the "weather" and "high" fields being used for generating the model prompt, and possibly all the columns being used for an example of how we'd like the model to respond.

{"weather": "cloudy", "high": "10", "commentary": "It'll be darker than a poem by Edgar Allan Poe!"}
{"weather": "sunny", "high": "27", "commentary": "You'd better grab your shades!"}
{"weather": "rainy", "high": "12", "commentary": "It's probably a good day to cozy up by the fire."}
...

Submit a tuning job

In the Airtrain UI, click New Task and then select Fine-tune Models. Then, follow these steps:

  • Give your run a memorable name, so you can find it in your jobs list later (ex: "My weather tuning")
  • Upload your dataset from the previous step. 90% of the rows from this dataset will be used for training, while the remaining 10% will be used to automatically do an evaluation after the training is done.
  • Select "Chat" mode or "Instruct" mode: If your use case involves exactly one prompt and one response from the model with no further interactions, you should select "Instruct" mode. If there are multiple turns between the user and the model, select "Chat" mode. While "instruct" use cases are a subset of chat, restricting to pure "Instruct" will allow more flexibility with how you can use the resulting model in Airtrain.
  • Select a base model: Select a model on which your fine-tune should be based. If you want more information about a given model, you can find more information by clicking the "View model card" link underneath the selected model.
  • Choose a name for your model variant. It should be descriptive, and composed only of alphanumeric characters and the '-' character (ex: "weather-summarizer"). If you reuse an existing variant slug, the tuned model that's produced may have a -N appended to the end, to indicate the Nth variant with that name.
  • Select the number of epochs. It is generally preferable to have a dataset large enough to achieve your desired performance with only one epoch of training, to avoid over-fitting to your particular dataset.

Instruct mode only

  • Specify a prompt template. You can use {{field_name}}style variables to refer to fields that should be populated from the contents of your uploaded dataset (ex: Tell the user the weather is {{weather}} with a high of {{high}}. Then make a playful comment about the weather.). Read more about prompt templates here .
  • Specify a response template. This template will be used to populate what a desirable response from the model should look like for the given prompt (ex: The weather today will be {{weather}} with a high of {{high}} Celsius. {{commentary}}). It is not uncommon for this template to only consist of a single template variable. If, for example, we only wanted the model to generate commentary on the weather, we might have this field be {{commentary}}. Read more about prompt templates here.

Chat mode only

If you are training the model in "chat" mode, your input data must have a field "messages" which conforms to the format for Hugging Face chat templates.

Final steps


  • Configure any evaluation metrics. This should be done as described in the Batch evaluation docs.
  • Double-check your configuration. Make sure your configuration is as desired; tuning can take several hours (or even days for large numbers of rows) and it's best not to launch until you're sure the job will do what you want!
  • Click Start Run. This will begin a tuning and take you to a status page for the job.

When your fine-tuning job is complete, you'll automatically receive an email with a link to the job results. The results contain a link to export your model as well as batch evaluation results showing how it performed (after tuning) on the metrics you specified.

Evaluate the model

Before you deploy your model for usage on real data, it's a good idea to make sure it performs well enough for your use case. You can use any models you've trained in an Airtrain Batch evaluation, defining custom metrics and even comparing to other models or variants. The models you have fine tuned will show up in the "Your Mistral fine-tunings" card on the "Model selection" column. The variant name you chose when tuning your model can be used to select it.

Export the model

Once you've decided to use a particular fine-tuning, go to the job that produced the mode (ex: by following the link in your email, or looking at the Jobs page showing your past jobs). Then, select the "Models" tab for the job.

Click on the Download link to download a tar file containing your model and its weights. After extracting the tar, you can upload the model to Hugging Face if you like, or simply upload the files to any cloud storage of your choosing.

Serve the model

The weights you've exported from Airtrain are compatible with the Hugging Face ecosystem, and can be served via any inference server capable of serving Mistral 7b from Hugging Face hub. Some popular options for inference server include vLLM and Hugging Face's own text-generation-inference server.