When Size Matters

Bigger is better, right?

If I want better results from my LLM, I need to use the latest and greatest, the biggest and best model available, right?

Honestly, no. Probably not. There are many, many models out there and most of them will handle your average use cases just fine. Even the small ones.

So, why have bigger and "better" models? Why do Anthropic, OpenAI, Google, and others keep releasing updates to their models if there are so many that can handle most of our use cases?

It all depends on what you're using it for. Some sizes of models will be better at certain use cases while others will excel where the smaller ones struggle. But do you really need the biggest and best model to summarize your notes or respond to an email? Let's dive in and explore this question.

First of all, what do I mean by different models?

Well, if you go to Claude or ChatGPT or Gemini or many other clients out there, you'll usually be able to find a dropdown that indicates which model you're using. Depending on the client, you may be able to change which model you're using.

What's more, there are millions of other models out there that you can use through tools like Ollama. Tools like this allow you to download and run these models locally (provided your machine has the resources to do so). These vary in size from very small to very large. But when would you want to use a small one versus a large one?

What makes a model small or large? When we talk about the size of the model, we're referring to the number of parameters used when training the model. Parameters are like data points used to train the model. Things like "the sky is blue" or other facts (or anti-facts) to help the model learn what it needs to learn.

Generally speaking, the more parameters used to train the model, the "smarter" the model will be, but also the more resource intensive. So, while the Llama 4 model is supposed to be pretty amazing, it also likely can't run on your laptop because of how large the model is. However, a model like Qwen 1.5 can likelly run on just about any machine because it is a small model.

Okay, so, resources are one constraint. What other reasons would you choose a small model vs a large one?

The other aspect is what you're using it for. Different use cases fit different sized models. For example, if you're looking for basic text summarization, simple chatbots, or basic data classification, a small model will generally do just fine with these tasks. However, if you're looking for tool usage, multi-step workflow execution, or advanced content generation, a large model will serve you much better. There's also a middle ground with medium sized models that can reason better than small models but don't require the same resources as a large model.

What are some examples of these models?

Small models generally fall below 10B parameters. Examples of these are Gemma 2B, Mistral 7B, Qwen 3 8B, and Phi-3 mini.

Medium models are more than 10B and fewer than 70B parameters. Llama 3 70B, Gemma 12B, Qwen 3 32B, and Mistral 8x7B are some examples here.

Large models are more than 70B and fewer than hundreds of billions or parameters. Examples are Llama 4, Falcon 180B, Deepseek-r1, and GPT-3.

Then there are the extra-large models, often referred to as the frontier models. These are generally in the hundreds of billions to trillions of parameters range. These are the models like Claude, GPT-4, Gemini, and others. These represent the latest breakthroughs in the generative AI and offer the most diverse set of tools and abilities.

So, if you're developing AI features in a product, you clearly have a lot of options of which models to use for your features. As you consider the needs of your feature, you can evaluate what you need to accomplish and select the appropriate size model to accomplish your goals while not requiring too many resources thus reducing costs.

Finally, how do you decide what to use? We talked about some use cases earlier. Generally speaking, the simpler the task, the smaller the model you'll need. The more involved or agentic you want to get, the bigger the model you may need. But honestly those are just guidelines. If you are able to, you should experiene with different sizes of models and see which ones fit your needs the best while weighing the impacts to user experience and cost per output. Through that experimentation, you'll be able to identify the sweetspot for your specfic feature that you're working on.

Comments