HMHamza Mostafa

Technical writer & builder

Field notes / 2026

Hamza
Mostafa

I write about agents, systems, and how intelligence becomes useful in the real world.

Read the writing About me

06

Featured essay

Evaluation

Evals Are Bullsh*t

Most evals measure whether an agent looks good. Good evals measure whether it does the job your business actually needs.

August 1, 2026

6 min read

Selected writing

The archive

01Agent SystemsThe Model Is the Wrong Thing to OwnEnterprises should own their intelligence. That means accumulating domain knowledge every new model can inherit, not training a model of their own.July 24, 20265 min read

02EvaluationSalesBench: The Long-Horizon Agent-to-Agent EvalA long-horizon RL environment where a small model learns to manage an insurance sales pipeline against an LLM buyer, scored by revenue closed instead of by an LLM judge. The trained model vastly outperforms the untrained base, and the gap widens as the eval gets harder.May 14, 202612 min read

03Agent SystemsThe Agent Research LoopWhat Karpathy's autoresearch really means, where agent systems are headed, and an open-source harness that ran 550 experiments over a weekend.March 16, 20268 min read

04Personal AgentsI Run a Personal AI Agent 24/7 on a Mac Mini. Here's How It Actually Works.A Mac Mini, some markdown files, and seven communication channels. Inside the setup that gives me a 24/7 AI assistant that monitors my email, iMessage, WhatsApp, and Twitter - and actually does useful things.March 7, 202612 min read

About / Now

Building, studying, and writing in public.

Background

I'm a builder and tinkerer. Previously on the Agent team at OpenAI. On leave from CS at the University of Waterloo, KP fellow.

Current questions

Right now I'm exploring multi-agent systems, long-horizon agents, and continual learning.

The common thread is simple: how do capable systems become useful, dependable, and better with experience?

Contact

Say
hello.

The fastest way to reach me is a DM on X or a note on LinkedIn. Interesting questions are always welcome.

X@hamostaf04 ↗GitHubHamza-Mos ↗LinkedInhamza-mostafa ↗