Fine-Tuning
Retraining an AI model on your own data. When it's worth it, when it's not, and how it works.
What fine-tuning is
Fine-tuning is taking a pre-trained model (like GPT-4o or Claude) and training it further on your own dataset. The result is a model that has the original model's general capabilities plus specialized behavior on your data.
If you fine-tune GPT-4o on 1,000 examples of your brand's copywriting, the resulting model will naturally produce copy in your brand's voice — without you having to explain it in every prompt.
Fine-tuning vs prompting — the real comparison
Most people reach for fine-tuning too early. Prompting is faster, cheaper, and easier to iterate.
Prompting is better when:
You're still figuring out what "good output" looks like
You need flexibility (different tasks, different tones)
Your dataset is small
Fine-tuning is better when:
You have hundreds or thousands of high-quality examples
You need consistent, specific behavior at scale
You want to reduce prompt length (fine-tuned behavior is baked in)
You're calling the API thousands of times and cost matters
How fine-tuning works
1. Prepare training data — pairs of inputs and ideal outputs. Typically 100–10,000 examples for useful results.
2. Upload to the provider's fine-tuning API (OpenAI, Anthropic, or open-source alternatives)
3. Training runs (hours to days depending on dataset size)
4. Test and evaluate the fine-tuned model
5. Deploy — use it via API like the base model
The training process doesn't change what the model knows — it changes how it behaves.
Practical costs
OpenAI fine-tuning (GPT-4o mini): approximately $0.008 per 1,000 training tokens. A dataset of 1,000 examples at 500 tokens each costs ~$4 to train. Inference is slightly more expensive than the base model.
Anthropic fine-tuning: available through their API, pricing varies by model.
The cost of fine-tuning is rarely the limiting factor. Preparing high-quality training data is.
When to actually use it
Start with prompting. If you find yourself using the same long system prompt repeatedly, have a consistent task, and have examples of ideal outputs — that's when fine-tuning makes sense.
The clearest signal: you've spent weeks prompting and the output quality has plateaued. You've done everything prompting can do. Now fine-tuning can take you further.
Start with prompting. Fine-tune when you've exhausted what prompts can do and have the data to justify it.