Models

The model is the reasoning engine at the heart of every AI system. It's the component that actually thinks — takes an input, applies learned patterns, and produces an output. But 'best model' is a misleading concept. The right model depends entirely on what you're asking it to do, where you're running it, and what constraints matter.

Three Categories

The model landscape divides into three broad categories, each with distinct trade-offs around capability, cost, control, and privacy.

Commercial Models

Claude, GPT, Gemini — the frontier models from Anthropic, OpenAI, and Google. These are the most capable general-purpose reasoning engines available. They're accessed via API, which means data leaves your infrastructure, but they offer the highest quality for complex tasks like multi-step reasoning, long-document analysis, and nuanced generation.

  • Strongest general reasoning
  • Largest context windows
  • Pay-per-token, no infrastructure to manage
  • Data travels to the provider's servers

Open-Source Models

Llama, Mistral, Qwen, DeepSeek — freely available models you can download, inspect, and run on your own infrastructure. The capability gap with commercial models has narrowed dramatically. For many production tasks, these models deliver comparable quality with full data sovereignty and no per-token costs.

  • Complete data control and privacy
  • No vendor lock-in or usage fees
  • Self-hostable on your own GPUs
  • Requires infrastructure investment

Fine-Tuned Models

Take a base model — commercial or open-source — and train it further on your specific data and tasks. The result is a specialist that outperforms a generalist model twice its size on your particular workload. Fine-tuning is how you turn a good model into your model.

  • Domain-specific accuracy gains
  • Smaller model, same or better quality
  • Lower inference cost per request
  • Requires curated training data

Choosing the Right Model

The question is never "what's the best model?" — it's "what's the best model for this specific task, within these specific constraints?" I help clients evaluate models against four dimensions that actually matter.

Cost

Frontier models charge per token. At scale, those costs compound fast. A task that runs thousands of times per day might be better served by a smaller open-source model running on dedicated hardware — same quality for the use case, predictable fixed cost instead of a growing variable one.

Latency

A user-facing chatbot needs sub-second responses. A batch document processing pipeline can tolerate minutes. The same model that's perfect for one is wrong for the other. Smaller models respond faster, and where you run them matters as much as which model you pick.

Data Sensitivity

If the data can't leave your infrastructure — regulated industries, legal documents, medical records — that rules out standard commercial API calls. The choice becomes self-hosted open-source, a private deployment arrangement with a commercial provider, or on-premise inference.

Task Complexity

Not every task needs a frontier model. Classification, extraction, and formatting tasks can run on small, fast models. Multi-step reasoning, creative generation, and ambiguous problem-solving genuinely benefit from larger, more capable models. Match the model to the task.

Pairs With

Models don't operate in isolation. Their value changes depending on what surrounds them in the stack.

Inference

The model is the brain; inference is where it runs. The same model deployed on a cloud API, a self-hosted GPU cluster, or an edge device will have completely different cost profiles, latency characteristics, and data sovereignty properties. Choosing a model and choosing an inference strategy are two halves of the same decision.

Inputs

Different models respond differently to the same prompt. The system prompt structure, few-shot examples, and formatting conventions that work brilliantly with one model can fall flat with another. Prompt engineering is always model-specific — when you change the model, you revisit the inputs.

Agents

Sophisticated agent architectures route between models dynamically. A small, fast model handles classification and triage. A frontier model tackles the complex reasoning steps. A fine-tuned specialist handles domain-specific extraction. The agent orchestrates the ensemble — using the right model for each sub-task instead of paying frontier prices for everything.

Need help choosing the right model?

I evaluate models against your real-world requirements — not synthetic benchmarks — and design systems that route intelligently between them as the landscape evolves.