Hermes Ollama Setup: Best Models For Each Use Case

How to setup Hermes with Ollama is half the battle — picking the right model for each task is the other half.

There are dozens of models on Ollama.

Most blog posts pretend they all work the same.

They don't.

This post tells you which Ollama model to pair with Hermes for each kind of task — based on what I actually run daily.

The Quick Answer

Best all-rounder: Gemma 4.

Best for agent workflows: DeepSeek.

Best for sub-agents: Nemotron 3 Nano Omni.

Best for code: Qwen 3.6.

Lightest (low-spec laptops): Gemma 4 (7GB).

If you only read this far — install Gemma 4 if you're new, switch to DeepSeek when you're doing serious agent work.

How To Setup Hermes With Ollama (5 Min)

Quick setup before we get into the model picks.

1 — Install Ollama

ollama.com → download → install.

Runs as a local server on http://localhost:11434.

2 — Install Hermes

Standard Hermes install from the GitHub repo.

If terminals scare you, use Claude Code or Codex to run the install — paste the command and ask "set this up".

3 — Pull a model in Ollama

Terminal command, like ollama run gemma4 or ollama run deepseek.

4 — Connect Hermes to Ollama

In Hermes config:

Provider: ollama
URL: http://localhost:11434
Model: your downloaded model name

5 — Send a test message

If you get a reply, you're set up.

Now let's talk models.

Gemma 4 — The Beginner's Default

7GB.

Small.

Fast.

Surprisingly capable.

I recommend Gemma 4 if:

You're on a laptop with 8–16GB RAM.
You want a working setup in 5 minutes.
You're testing Hermes for the first time.

It handles short writing tasks, summarisation, and basic reasoning fine.

I covered Gemma 4 specifically in Hermes Gemma 4.

DeepSeek — The Agent Workhorse

DeepSeek is built for agent tasks.

Tool use is solid.

Multi-step reasoning works well.

Sized to fit on most modern machines.

Use DeepSeek when:

You're running Hermes with skills (web search, browser, terminal).
You want better reasoning than Gemma 4.
You're doing real agentic work, not just chat.

I cover DeepSeek deeper in Hermes DeepSeek.

Nemotron 3 Nano Omni — The Sub-Agent King

Nvidia's recent release.

28GB.

Designed specifically for agentic tasks at scale.

Best feature: it's tuned to power sub-agents.

Use Nemotron 3 when:

You're running multi-agent setups.
You want each sub-agent to be efficient and focused.
You've got the RAM (32GB+ recommended).

If you're playing with Hermes agent workspaces where you have a brain agent + sub-agents, this is the sub-agent model.

Qwen 3.6 — Best For Code

Qwen 3.6 has stronger code generation than Gemma or DeepSeek.

Use when:

Your Hermes agent does coding tasks.
You're integrating with file/terminal skills.
You want cleaner code output.

Cloud equivalent (if you go free cloud) is Qwen 3.5 Cloud.

🔥 Want my full Hermes model stack? Inside the AI Profit Boardroom, I share my model picks per use case, system prompts per agent, and a 2-hour Hermes course covering every config. Plus weekly live coaching where you can share your screen and we'll dial your Ollama setup. 2,800+ members. → Get the stack here

Models To Avoid (Or Use Carefully)

Three categories.

1. Anything 70B+ on a normal laptop. Will crash or be unusably slow.

2. Older models pre-2024. Most aren't agent-tuned.

3. Vague "general-purpose" models without agent benchmarks. Stick to the picks above.

Free Cloud Models Inside Hermes

If your hardware can't handle local, Hermes also supports free cloud tiers.

Recommended free cloud picks:

Kimi K2.5 — strong reasoning.
GLM 5.1 — solid all-rounder.
Qwen 3.5 Cloud — code-friendly.
Minimax M2.7 Cloud — consistent.

These have token limits but they're genuinely usable for testing.

I cover Kimi specifically in Kimi K2.6 Agent Swarms.

Picking By Hardware

Here's the honest breakdown.

8GB RAM laptops: Gemma 4. Don't try anything bigger.

16GB RAM laptops: Gemma 4 daily, DeepSeek for serious tasks.

32GB RAM machines: DeepSeek as default, Nemotron 3 for sub-agents, Qwen 3.6 for code.

64GB+: Run anything. Mix and match per agent.

Picking By Use Case

Match the model to the work.

Daily chat: Gemma 4.

Research with web search skill: DeepSeek.

Multi-agent setups: Nemotron 3 (sub-agents) + DeepSeek (brain agent).

Code generation: Qwen 3.6.

Long-form writing: Gemma 4 or DeepSeek.

Background automation: Whatever runs reliably on your machine.

Switching Models Quickly

Hermes lets you swap models without reinstalling.

In the Hermes config, change the model name and restart.

You can also have multiple model entries in config — point different agents at different models.

This is one of Hermes' best features and matches what's possible in Hermes Agent Mission Control.

Common Model-Picking Mistakes

1. Picking too big.

People install the biggest model "just in case". Then their machine crashes.

Match the model to your RAM.

2. Picking too small.

Tiny models can't handle tool use well.

Don't go below Gemma 4 size if you're doing real agent work.

3. Not matching model to task.

A code-tuned model writes worse poetry. Pick by use case, not vibes.

The Real Cost Of Free Local

Be straight about this.

Free Ollama models cost:

Disk space (each model is 5–28GB).
A few pence/day in electricity.
Some upfront time to learn what works for you.

That's still 95% cheaper than running cloud APIs daily.

🚀 Want a full Hermes + Ollama playbook? The AI Profit Boardroom has the 2-hour Hermes course, my model rotation rules, and weekly live coaching. Plus daily training drops on AI agents and SEO. 2,800+ members. → Join here

FAQ — Best Hermes Ollama Models

Which Ollama model is best for Hermes overall?

DeepSeek for serious agent work. Gemma 4 for lightweight machines.

Can I run multiple Ollama models in Hermes?

Yes — define them as separate providers in config and switch as needed.

What's the smallest model that works well?

Gemma 4 (~7GB) is the floor.

Smaller models struggle with tool use.

Do I need a GPU?

Not strictly — Ollama uses CPU + GPU as available.

A modern laptop CPU handles small models fine.

Which model is best for web scraping?

DeepSeek — solid tool use and reasoning.

Which model is best for code generation?

Qwen 3.6 (or its cloud variant Qwen 3.5).

Should beginners use cloud or local?

Local with Gemma 4. Easiest start, zero cost.