Agentic Engineering: Practical Guide

This is a working reference for getting real value out of AI agents.

Not theory. Not prompts.

This is about building a system that actually works in a development environment.

The Goal

Get more value out of AI.

In practice, that means:

less rework
less repetition
fewer “almost correct” outputs
faster iteration
more predictable results

The Core Idea

Models do not carry reliable working memory from one session to the next.

If you want continuity, you have to engineer it.

That structure comes from:

skills
design documents and ADRs
tool-agnostic AI artifacts
specialized agents
rules
examples
verification

The Mental Model

Stop thinking in prompts.

Start thinking in systems.

A good system includes:

reusable skills - the "how to" dictionary
persistent artifacts (docs, ADRs, patterns) - the "why we do it this way" dictionary
specialized agents - your "team members"
clear task framing
strong review loops

Engineer the Memory

Keep the durable AI artifacts in plain markdown, checked into git.

Usually that means an ai/ or .ai/ directory.

Then add thin connective tissue for the specific tools you use. Cursor. Claude Code. Whatever comes next.

That way the memory belongs to the project, not to a single AI tool.

One useful habit: ask the agent to review its memory and update the tool-agnostic skills and rules so the next agent starts stronger.

What a “Skill” Is

A skill is a reusable unit of work.

It tells the agent how to do something properly in your environment.

Examples:

how to add a field across DB + API + UI
how to write a GraphQL resolver
how to run an integration test
how to debug in the browser
how to investigate a Jira ticket

When to Create a Skill

Two main signals:

1. The task is repeatable

If you’re doing something more than once, it should probably be a skill.

2. The agent is struggling

If the agent is:

trying multiple approaches
taking too long
producing inconsistent results

That’s a missing skill.

Fix it once. Capture it.

How to Build Skills

Do the task once
Capture the steps
Add constraints and patterns
Store it as a skill
Reference it next time

Practical Tips

There are lots of skills out there to take inspiration from. A quick search will often help you sharpen yours.

Use AI to help build your skills, but review carefully and don't let AI slop in. Mistakes in skills get amplified. Keep them short and specific to reduce context window and token costs.

Types of Skills

Direct Skills

Common repeatable work (New Pages, DB schema updates and API extensions, releases)

These skills can often spell out tests and typically deliver a feature, fix or process.

Callable Skills

Composable units (browser debugging, Playwright, DB access)

Meta Skills

Skills that improve how other skills are used, such as a skill for creating new skills or a task kickoff skill for writing better specs.

Writing Specs

A spec is a detailed description of what needs to be done. For complex work, a lot of leverage comes from the spec. The better the spec, the less cleanup you do later.

A good spec includes:

Reference relevant skills and design documents
Define the expected output
Define the tests and verification
Break large tasks into smaller, gated steps
Assign substeps to specialized agents (designer, tester, developer, etc.).
Define an orchestrator agent that coordinates the specialized agents and ensures the task is completed.

Practical Tips

Make screenshots part of the deliverables of the task. This forces the agent to run the resulting code and helps to avoid the all-too-common scenario when the agent says it's done and it won't even run.

Agents think in "code" and not "app" and struggle with long horizon perspectives on how the app is actually used. Include end to end workflow that spell out each step of using the new feature and ask the agent to deliver a Playwright test.

You will think you have a tight spec when you kick off the agent. You'll discover you've missed stuff. Make sure the agent knows to check in when unexpected "discoveries" are made.

Don't skimp on reviewing the spec. You'll be building the spec collaboratively with the agent and AI slop can creep in all too easily.

Verification Is the Work

This is the part people skip.

Code generation is cheap. Verification is the bottleneck.

Do not only verify the final answer. Verify the path the agent took to get there.

What to Verify

the app runs
the tests pass
the user workflow works end to end
the screenshots match the claim
the agent used real APIs, real tools, and the right files
there are no hidden fallbacks masking broken infrastructure

Build Evals from Real Failures

When an agent screws something up, do not just fix the code. Add a test, screenshot check, rubric, or rule that catches that class of failure next time.

You want two kinds of verification: capability checks for new work and regression checks for things that used to work.

Keep a small bank of real tasks and rerun them when the tools, skills, or models change.

Common Failure Modes

trusting the first answer
one-shot prompting when the task needs specs, artifacts, and review
stale context and missing project memory
loops, retries, and token burn without progress
hidden fallbacks that make broken systems look healthy
cognitive debt - if you stop understanding the code and the business, the bill comes later

Practical Habits

capture repeatable work
keep project memory tool agnostic
turn failures into tests, rules, or skills
define verification
keep skills small

Final Principle

AI is a multiplier.

The more structure you give it, the higher the multiplier.