April 10, 2026

Insights

How to Fine-Tune an Open Model for Your AI Agent

Make your agent smarter at the tasks that matter most to you with custom model training.

Team Tulip

Quick Answer

Fine-tuning is the process of taking a pre-trained AI model and training it further on your own data so it performs better at specific tasks. For AI agents, this means you can make your model more reliable at tool calling, better at following your preferred style, more knowledgeable about your domain, and more consistent in its responses. Modern techniques like LoRA make fine-tuning accessible without massive compute budgets — you can fine-tune a model in under an hour for a few pounds.

Why Fine-Tune Instead of Just Prompting?

Prompting is the first tool most people reach for when they want to change how their agent behaves. Write a detailed SOUL.md, give it examples, tell it what to do and what not to do. This works well for many use cases, and you should always start here.

But prompting has limits. Every instruction you add to your prompt consumes context tokens, leaving less room for the actual conversation. Complex instructions can conflict with each other. And some behaviours are hard to teach through prompting alone — like consistently using a specific output format, or reliably calling the right tool in edge cases.

Fine-tuning bakes these behaviours into the model itself. Instead of telling the model what to do every single time, you train it so those behaviours become its default. The result is a model that needs shorter prompts, makes fewer mistakes on your specific tasks, and responds more consistently.

Think of it like this: prompting is giving someone detailed instructions before each task. Fine-tuning is training them so they already know how to do the task.

What Can You Fine-Tune For?

Better tool calling. If your agent frequently picks the wrong tool or passes incorrect parameters, fine-tuning on examples of correct tool usage can dramatically improve reliability. This is one of the highest-value applications of fine-tuning for agents.

Domain expertise. Train your model on your industry's documentation, terminology, and common questions. A fine-tuned model for real estate, healthcare, legal, or any specialised field will give more accurate and relevant responses than a general-purpose model.

Consistent style and tone. If you want your agent to always respond in a particular way — formal and concise, or friendly and detailed, or matching your brand voice — fine-tuning on examples of your preferred style is more reliable than prompting.

Specific output formats. If your agent needs to consistently produce structured outputs like JSON, specific report formats, or templated responses, fine-tuning teaches the model to produce these formats reliably.

Understanding LoRA: Fine-Tuning Without the Cost

Full fine-tuning means updating every parameter in the model, which requires enormous compute resources. A full fine-tune of a 70B parameter model could cost thousands of pounds and require multiple high-end GPUs.

LoRA (Low-Rank Adaptation) changed the game. Instead of updating all parameters, LoRA adds a small set of trainable parameters alongside the frozen original model. This reduces compute requirements by 50 to 70 percent and produces results that are nearly as good as full fine-tuning for most use cases.

In practical terms, LoRA means you can fine-tune a 7B model on a single consumer GPU in under an hour, or fine-tune a 14B model on a cloud GPU for a few pounds. It has made fine-tuning accessible to individuals and small teams, not just large organisations with massive compute budgets.

What Data Do You Need?

The quality of your fine-tune depends entirely on the quality of your training data. You need examples of the behaviour you want the model to learn, formatted as conversation pairs — an input (what the user or system says) and the ideal output (what the model should respond).

For tool calling improvement, this means examples of user requests paired with the correct tool call the model should make, including the right tool name, parameters, and format.

For style training, this means examples of prompts paired with responses written in your desired style.

For domain expertise, this means question-answer pairs drawn from your domain knowledge base.

You do not need millions of examples. For most agent fine-tuning tasks, 100 to 500 high-quality examples produce meaningful improvements. Some practitioners report noticeable gains with as few as 50 well-crafted examples. Quality matters far more than quantity — every example should be something you would be happy for the model to copy exactly.

The Fine-Tuning Process

The basic workflow has four steps. First, prepare your dataset. Collect or create examples of the behaviour you want. Format them according to your chosen platform's requirements — most use a simple JSON format with message arrays.

Second, choose your base model. Start with the model you are already using with your agent. If you are running Qwen 3.5 on Tulip, fine-tune Qwen 3.5. Starting from a model you know works well gives you the best foundation.

Third, run the training. Upload your dataset and configure the training parameters. The key settings are learning rate (how aggressively the model learns — too high and it forgets its general knowledge, too low and it does not learn your examples), number of epochs (how many times it goes through your data — usually one to three is enough), and LoRA rank (controls the capacity of the adaptation — higher ranks learn more but use more memory).

Fourth, evaluate the result. Test your fine-tuned model on examples it has not seen before. Does it reliably produce the behaviour you trained for? Does it still handle general tasks well? If the results are not good enough, iterate on your data and training settings.

Where to Fine-Tune

Several platforms make fine-tuning accessible. Together AI offers a comprehensive fine-tuning platform with support for LoRA, DPO, and continued training across many open models. Replicate lets you fine-tune image models like FLUX with a simple API. RunPod provides raw GPU access if you want full control over the training process.

For people running agents on Tulip, the workflow is to fine-tune your model using one of these platforms, then deploy the fine-tuned model on Tulip for inference. Tulip supports custom model deployments, so you can run your specialised model alongside your OpenClaw agent.

Common Mistakes to Avoid

Over-fitting is the biggest risk. If you train too long or on too few examples, the model memorises your training data rather than learning general patterns. It will perform brilliantly on tasks identical to your training examples and poorly on everything else. Use a held-out validation set to catch this.

Another common mistake is degrading general capabilities. If you fine-tune aggressively on a narrow task, the model may forget how to do other things well. Keep your fine-tuning focused and use a conservative learning rate to preserve the model's broad knowledge.

Finally, do not fine-tune when prompting would work. Fine-tuning is powerful but it adds complexity to your pipeline. If you can get the behaviour you want through a well-crafted SOUL.md file, that is simpler and easier to iterate on.

Frequently Asked Questions

How much does fine-tuning cost?

With LoRA on a cloud platform, fine-tuning a 7B-14B model on a few hundred examples typically costs between two and ten pounds. Larger models and bigger datasets cost more, but it is no longer the expensive process it used to be.

Do I need to know how to code?

Some coding knowledge helps, particularly Python, but several platforms now offer no-code fine-tuning interfaces. Together AI and Replicate both have web-based workflows where you upload your data and click train.

Can I fine-tune and still use MCP skills?

Absolutely. Fine-tuning changes the model's behaviour but does not affect OpenClaw's skill system. Your fine-tuned model still connects to all the same MCP skills. In fact, fine-tuning specifically for better tool calling can make your agent more effective at using its skills.

How often should I re-fine-tune?

Whenever your requirements change significantly or you accumulate enough new examples to meaningfully improve performance. Many teams fine-tune quarterly, updating the model with new data from real agent interactions.

Continue reading

View all blogs

Insights

Get Started

Deploy an agent, `today`

Run your first agent on Tulip in a few clicks

Deploy Agent