Building Normiebench: Benchmarking Local AI on Normal People’s Hardware

I bought my Mac mini M4 to be my OpenClaw machine.

That part was rational.

Then I found a fully loaded 2013 Mac Pro trashcan on eBay for $167 and bought it for a much less rational reason: I have always wanted one because they are beautiful.

That glossy black cylinder is still one of the coolest-looking computers Apple has ever made. I did not buy it because I thought it would be some amazing local AI box. I bought it because I wanted a trashcan Mac Pro on my desk.

But after I actually started using it, I was surprised by how fast it felt.

Not “wow this destroys modern hardware” fast. Obviously not that. But fast enough that I kept thinking: okay, how close is this thing really to my Mac mini M4 when it comes to running small local models?

That question is what eventually turned into Normiebench.

The Original Goal Was Just a Fair Comparison

At first I was not trying to build a public benchmark.

I just wanted a clean way to compare two machines I owned:

My Mac mini M4
My fully loaded 2013 Mac Pro trashcan

That sounds easy until you actually try to make it fair.

A lot of AI benchmark comparisons end up being messy because people change too many variables at once. Different prompts. Different settings. Different seeds. Different quantizations. Different runtimes. Different contexts. Then the result gets presented like a definitive answer when it is really just a vague vibe check.

I wanted something much simpler and much more honest.

If I was going to compare the machines, I wanted to compare them using the same model setup, the same prompts, the same seeds, and the same basic generation conditions so I was measuring the hardware and runtime behavior instead of accidental randomness.

That was the key idea: make it standardized enough to be fair, but simple enough that normal people would actually run it.

The Trashcan Was Way More Interesting Than I Expected

I expected the Mac mini M4 to win.

What I did not expect was how respectable the trashcan Mac Pro would feel in the process.

That machine is over a decade old. It was basically an impulse purchase because I liked the industrial design. And yet once I started running local AI workloads on it, it felt much more alive than I expected.

That is the kind of thing I love finding out.

There is something really fun about old hardware still having real life left in it, especially for local AI where the conversation is often dominated by giant GPUs, huge servers, and benchmark numbers that are completely disconnected from what most people actually own.

The trashcan made me curious not just about my two machines, but about the broader category of regular-person hardware.

From One Comparison to a Real Benchmark

Once I had a decent way to compare the Mac Pro and the Mac mini, the next question came naturally:

What about all the other machines people are actually running local models on?

That is where the idea for Normiebench really clicked.

There are already plenty of benchmarks for top-end hardware. If you want to compare expensive GPUs, datacenter cards, or wildly overbuilt setups, that information exists.

What I wanted was the opposite.

I wanted a benchmark for:

Older Macs
Mac minis
Laptops
Random desktops
Mini PCs
Midrange GPUs
The stuff people already have

Basically: a benchmark for normal people’s hardware running small local LLMs.

That is what made the name feel right.

Normiebench is not trying to be the most academic benchmark in the world. It is trying to answer a practical question:

If I run a small local model on this machine, what kind of experience should I expect?

That is a useful question.

What I Wanted the Benchmark to Be

I wanted Normiebench to be a few specific things.

1. Simple

If running the benchmark feels annoying, too technical, or too easy to mess up, a lot of people will never bother. So simplicity mattered a lot.

2. Fair

If every person tweaks the setup in a slightly different way, the leaderboard becomes noise. That is why standardization matters. Same seeds. Same benchmark flow. Same general setup. Fewer excuses.

3. Relevant

I care a lot more about “how does this machine feel when generating with a small local LLM?” than I do about abstract numbers that do not map to a real user experience.

4. Focused on Real Hardware

Not lab hardware. Not dream hardware. Not fantasy YouTube benchmark hardware.

Normal hardware.

That focus is the whole point.

Why I Think This Matters

One thing I really like about local AI right now is that useful models keep getting more accessible.

You do not need a ridiculous setup to do interesting things anymore. Small models keep improving. Inference stacks keep improving. Old machines can still surprise you. Midrange machines can be genuinely capable. That is a much more interesting story to me than endlessly talking about hardware most people will never buy.

I think a lot of people are curious about local AI, but they do not know whether the machine they already own is “good enough” to bother with.

That is exactly the kind of uncertainty I wanted Normiebench to help reduce.

If somebody can look at benchmark results and see where their Mac mini, old Mac Pro, laptop, or desktop lands relative to other normal systems, that makes local AI feel a lot more concrete.

The Funny Part

The funny part is that none of this would have happened if I had not bought a computer mostly because it looked cool.

The trashcan Mac Pro was supposed to be a fun desk object. A little Apple design trophy. Something I always wanted and finally found cheap enough to justify.

Instead it ended up pushing me into building a benchmark project I actually wanted to use.

That is probably my favorite kind of project origin story: not a giant master plan, just curiosity turning into something real.

Check It Out

If you want to see the benchmark, go to normiebench.com.

If you want the code, it is here: github.com/brand-o/normiebench

And if you also think the 2013 Mac Pro trashcan is one of the prettiest computers ever made, yes, I still agree.