May 2026

The 7 Levels of Agent Engineering

Why most people stop at L2 and think they're at L5.

The Ladder Nobody Climbs

There are seven levels of agent engineering. Almost nobody in the industry is past level two. Most of the people selling you "AI agents" at five-figure prices are working at L1.

This isn't a hot take. It's a complexity classification. Each level requires a discipline the previous level didn't need. You can't get to L5 by writing a longer prompt. You can't get to L6 by adding more tools. The levels are not a difficulty curve — they're different problems that look similar from the outside.

The setup that agencies charge $5K-35K for is prompt engineering plus platform config. Ten minutes of actual work, wrapped in a white-label dashboard. That's L1 with a logo.

L1 — Prompts

The monkey layer. You write text, the model produces text. Every AI agency on Earth is selling this and calling it something else. ChatGPT wrappers. Persona templates. "Custom GPTs." Vapi scripts.

What you're paying for is somebody who learned a few prompt patterns and a platform login. It is not engineering. It is typing.

L2 — Tools

The model can call functions. Function calling. MCP. Tool use. Now the agent can do things — query a database, send an email, hit an API.

This is where the marketing copy turns into "AI agents that take action." It sounds bigger than it is. At L2 you have a prompt that picks a tool and fills in arguments. The model is still just generating text — some of that text just happens to be a function name.

L2 is necessary. It is not sufficient. An L2 agent will confidently call the wrong tool with confidently wrong arguments. Because nothing in L1 or L2 makes it know what it's doing.

L3 — Context

Now the agent has knowledge. Wikis, RAG, skills, retrieval. The "second brain" everybody got excited about when Karpathy said it.

L3 is where the agent stops sounding generic. It knows your business, your patterns, your past decisions. It pulls the right document into the prompt at the right time.

But — and this is the part the second-brain crowd glosses over — context without structure is just a bigger prompt. The agent reads. It does not understand. We'll come back to this.

L4 — Harnesses

This is where engineering starts.

A harness is runtime control over what flows into and out of the agent. Hooks. Middleware. Workflow orchestration. Routing logic. Validators. The harness is the system around the agent that determines what it sees, what it can do, when it runs, and what happens to its output.

Skills are L3 content delivered through an L4 harness. MCP is L2 capability exposed through an L4 harness. The harness is what turns scattered capability into a system.

L4 is where the fork happens.

The Fork: Path A vs Path B

At L4, the road splits.

Path A — Deploy and profit. You have a working harness around a competent agent with the right context. You ship it. You make money. Most agencies that actually deliver value live here. This is honest L4 work. It works.

Path B — The admissibility path. You decide the agent isn't allowed to be wrong. Now you're going to L5, L6, L7. You're going to make the agent prove what it says.

Both paths are legitimate. Path A pays the bills. Path B is what we're building. Most people don't even know Path B exists.

L5 — Admissibility

The agent's outputs must compose from validated parts.

Schema keys become a semantic web. The semantic web becomes an ontology. The ontology becomes a constraint the agent literally cannot violate, because the structure of validated observations prevents the wrong composition from being admitted.

An L5 agent can't hallucinate the way an L4 agent can. Not because we added a "fact check" step. Because every claim has to chain back to validated observations through a composition of parts that each individually check out. If the chain doesn't close, the claim doesn't ship.

This is what we call admissibility engineering. It's the discipline of making AI reason correctly, not just fluently.

L6 — Concentration

L5 controls what the agent says. L6 controls how the agent thinks.

In-context learning metrics. CoR (chain of reasoning) anti-drift. Traceback as self-reflection. Compound intelligence. The agent observes its own cognition, scores it, and corrects.

L6 is when the agent stops being a thing you operate and starts being a thing that monitors itself. The metrics aren't bolted on. They're how it thinks. Drift is detected and corrected in the same loop that produces the work.

This is where most engineers stop, if they ever get here. The infrastructure to do L6 properly does not exist as a product anywhere. You have to build it.

L7 — Emergence Engineering

The system builds itself.

A human plus AI team that knows how to build other human plus AI teams. Emergent building concatenation. New capabilities composed from existing ones at runtime. The system isn't being maintained. It's extending.

L7 is not a software pattern. L7 is a person. Specifically, a person who has internalized L1 through L6 well enough that the team they form with their AI infrastructure starts producing capability that wasn't designed in advance.

L7 cannot be sold as a product. It cannot be compiled. It is the operator at the top of the stack. Every agency you'll ever talk to is selling L1 and pretending it's L7. We're the only people I know of openly discussing what L7 actually is.

The Honest Map

So where is the industry?

99% of "AI agencies": L1, sometimes L2. Charging L7 prices for prompt templates.
Serious engineering teams: L4 Path A. Real harnesses, real value, honest work.
Research labs: Partial L5, scattered L6. Mostly in papers, not products.
SOMA / GNOSYS / SANCREV (us): Full L5 implemented, L6 in production, L7 is the operator.

If you're paying $5K, $10K, $35K for "AI" and what you got was prompts and a dashboard — you got L1 dressed as L5. That's not your fault. The vocabulary doesn't exist outside this site.

Now you have the vocabulary. Next post: what L1 and L2 actually are, and why most people stop there.