Hermes Ollama Setup: Best Models For Each Use Case

How to setup Hermes with Ollama is half the battle — picking the right model for each task is the other half.

There are dozens of models on Ollama.

Most blog posts pretend they all work the same.

They don't.

This post tells you which Ollama model to pair with Hermes for each kind of task — based on what I actually run daily.

The Quick Answer

Best all-rounder: Gemma 4.

Best for agent workflows: DeepSeek.

Best for sub-agents: Nemotron 3 Nano Omni.

Best for code: Qwen 3.6.

Lightest (low-spec laptops): Gemma 4 (7GB).

If you only read this far — install Gemma 4 if you're new, switch to DeepSeek when you're doing serious agent work.

How To Setup Hermes With Ollama (5 Min)

Quick setup before we get into the model picks.

1 — Install Ollama

ollama.com → download → install.

Runs as a local server on http://localhost:11434.

2 — Install Hermes

Standard Hermes install from the GitHub repo.

If terminals scare you, use Claude Code or Codex to run the install — paste the command and ask "set this up".

3 — Pull a model in Ollama

Terminal command, like ollama run gemma4 or ollama run deepseek.

4 — Connect Hermes to Ollama

In Hermes config:

5 — Send a test message

If you get a reply, you're set up.

Now let's talk models.

Gemma 4 — The Beginner's Default

7GB.

Small.

Fast.

Surprisingly capable.

I recommend Gemma 4 if:

It handles short writing tasks, summarisation, and basic reasoning fine.

I covered Gemma 4 specifically in Hermes Gemma 4.

DeepSeek — The Agent Workhorse

DeepSeek is built for agent tasks.

Tool use is solid.

Multi-step reasoning works well.

Sized to fit on most modern machines.

Use DeepSeek when:

I cover DeepSeek deeper in Hermes DeepSeek.

Nemotron 3 Nano Omni — The Sub-Agent King

Nvidia's recent release.

28GB.

Designed specifically for agentic tasks at scale.

Best feature: it's tuned to power sub-agents.

Use Nemotron 3 when:

If you're playing with Hermes agent workspaces where you have a brain agent + sub-agents, this is the sub-agent model.

Qwen 3.6 — Best For Code

Qwen 3.6 has stronger code generation than Gemma or DeepSeek.

Use when:

Cloud equivalent (if you go free cloud) is Qwen 3.5 Cloud.

🔥 Want my full Hermes model stack? Inside the AI Profit Boardroom, I share my model picks per use case, system prompts per agent, and a 2-hour Hermes course covering every config. Plus weekly live coaching where you can share your screen and we'll dial your Ollama setup. 2,800+ members. → Get the stack here

Models To Avoid (Or Use Carefully)

Three categories.

1. Anything 70B+ on a normal laptop. Will crash or be unusably slow.

2. Older models pre-2024. Most aren't agent-tuned.

3. Vague "general-purpose" models without agent benchmarks. Stick to the picks above.

Free Cloud Models Inside Hermes

If your hardware can't handle local, Hermes also supports free cloud tiers.

Recommended free cloud picks:

These have token limits but they're genuinely usable for testing.

I cover Kimi specifically in Kimi K2.6 Agent Swarms.

Picking By Hardware

Here's the honest breakdown.

8GB RAM laptops: Gemma 4. Don't try anything bigger.

16GB RAM laptops: Gemma 4 daily, DeepSeek for serious tasks.

32GB RAM machines: DeepSeek as default, Nemotron 3 for sub-agents, Qwen 3.6 for code.

64GB+: Run anything. Mix and match per agent.

Picking By Use Case

Match the model to the work.

Daily chat: Gemma 4.

Research with web search skill: DeepSeek.

Multi-agent setups: Nemotron 3 (sub-agents) + DeepSeek (brain agent).

Code generation: Qwen 3.6.

Long-form writing: Gemma 4 or DeepSeek.

Background automation: Whatever runs reliably on your machine.

Switching Models Quickly

Hermes lets you swap models without reinstalling.

In the Hermes config, change the model name and restart.

You can also have multiple model entries in config — point different agents at different models.

This is one of Hermes' best features and matches what's possible in Hermes Agent Mission Control.

Common Model-Picking Mistakes

1. Picking too big.

People install the biggest model "just in case". Then their machine crashes.

Match the model to your RAM.

2. Picking too small.

Tiny models can't handle tool use well.

Don't go below Gemma 4 size if you're doing real agent work.

3. Not matching model to task.

A code-tuned model writes worse poetry. Pick by use case, not vibes.

The Real Cost Of Free Local

Be straight about this.

Free Ollama models cost:

That's still 95% cheaper than running cloud APIs daily.

🚀 Want a full Hermes + Ollama playbook? The AI Profit Boardroom has the 2-hour Hermes course, my model rotation rules, and weekly live coaching. Plus daily training drops on AI agents and SEO. 2,800+ members. → Join here

FAQ — Best Hermes Ollama Models

Which Ollama model is best for Hermes overall?

DeepSeek for serious agent work. Gemma 4 for lightweight machines.

Can I run multiple Ollama models in Hermes?

Yes — define them as separate providers in config and switch as needed.

What's the smallest model that works well?

Gemma 4 (~7GB) is the floor.

Smaller models struggle with tool use.

Do I need a GPU?

Not strictly — Ollama uses CPU + GPU as available.

A modern laptop CPU handles small models fine.

Which model is best for web scraping?

DeepSeek — solid tool use and reasoning.

Which model is best for code generation?

Qwen 3.6 (or its cloud variant Qwen 3.5).

Should beginners use cloud or local?

Local with Gemma 4. Easiest start, zero cost.

Related Reading

📺 Video notes + links to the tools 👉 https://www.skool.com/ai-profit-lab-7462/about

🎥 Learn how I make these videos 👉 https://aiprofitboardroom.com/

🆓 Get a FREE AI Course + Community + 1,000 AI Agents 👉 https://www.skool.com/ai-seo-with-julian-goldie-1553/about

That's how to setup Hermes with Ollama and pick the right model for the job — no more guessing which one to install.

Ready to Succeed With AI?

Join 2,800+ entrepreneurs inside the AI Profit Boardroom. Get proven AI workflows, daily coaching, and a community that actually helps you win.

Join The AI Profit Boardroom →

7-Day No-Questions Refund • Cancel Anytime