Agentic Engineering: Practical Guide
This is a working reference for getting real value out of AI agents.
Not theory. Not prompts.
This is about building a system that actually works in a development environment.
The Goal
Get more value out of AI.
In practice, that means:
- less rework
- less repetition
- fewer “almost correct” outputs
- faster iteration
- more predictable results
The Core Idea
Models do not carry reliable working memory from one session to the next.
If you want continuity, you have to engineer it.
That structure comes from:
- skills
- design documents and ADRs
- tool-agnostic AI artifacts
- specialized agents
- rules
- examples
- verification
The Mental Model
Stop thinking in prompts.
Start thinking in systems.
A good system includes:
- reusable skills - the "how to" dictionary
- persistent artifacts (docs, ADRs, patterns) - the "why we do it this way" dictionary
- specialized agents - your "team members"
- clear task framing
- strong review loops
Engineer the Memory
Keep the durable AI artifacts in plain markdown, checked into git.
Usually that means an ai/ or .ai/ directory.
Then add thin connective tissue for the specific tools you use. Cursor. Claude Code. Whatever comes next.
That way the memory belongs to the project, not to a single AI tool.
One useful habit: ask the agent to review its memory and update the tool-agnostic skills and rules so the next agent starts stronger.
What a “Skill” Is
A skill is a reusable unit of work.
It tells the agent how to do something properly in your environment.
Examples:
- how to add a field across DB + API + UI
- how to write a GraphQL resolver
- how to run an integration test
- how to debug in the browser
- how to investigate a Jira ticket
When to Create a Skill
Two main signals:
1. The task is repeatable
If you’re doing something more than once, it should probably be a skill.
2. The agent is struggling
If the agent is:
- trying multiple approaches
- taking too long
- producing inconsistent results
That’s a missing skill.
Fix it once. Capture it.
How to Build Skills
- Do the task once
- Capture the steps
- Add constraints and patterns
- Store it as a skill
- Reference it next time
Practical Tips
There are lots of skills out there to take inspiration from. A quick search will often help you sharpen yours.
Use AI to help build your skills, but review carefully and don't let AI slop in. Mistakes in skills get amplified. Keep them short and specific to reduce context window and token costs.
Types of Skills
Direct Skills
Common repeatable work (New Pages, DB schema updates and API extensions, releases)
These skills can often spell out tests and typically deliver a feature, fix or process.
Callable Skills
Composable units (browser debugging, Playwright, DB access)
Meta Skills
Skills that improve how other skills are used, such as a skill for creating new skills or a task kickoff skill for writing better specs.
Writing Specs
A spec is a detailed description of what needs to be done. For complex work, a lot of leverage comes from the spec. The better the spec, the less cleanup you do later.
A good spec includes:
- Reference relevant skills and design documents
- Define the expected output
- Define the tests and verification
- Break large tasks into smaller, gated steps
- Assign substeps to specialized agents (designer, tester, developer, etc.).
- Define an orchestrator agent that coordinates the specialized agents and ensures the task is completed.
Practical Tips
Make screenshots part of the deliverables of the task. This forces the agent to run the resulting code and helps to avoid the all-too-common scenario when the agent says it's done and it won't even run.
Agents think in "code" and not "app" and struggle with long horizon perspectives on how the app is actually used. Include end to end workflow that spell out each step of using the new feature and ask the agent to deliver a Playwright test.
You will think you have a tight spec when you kick off the agent. You'll discover you've missed stuff. Make sure the agent knows to check in when unexpected "discoveries" are made.
Don't skimp on reviewing the spec. You'll be building the spec collaboratively with the agent and AI slop can creep in all too easily.
Verification Is the Work
This is the part people skip.
Code generation is cheap. Verification is the bottleneck.
Do not only verify the final answer. Verify the path the agent took to get there.
What to Verify
- the app runs
- the tests pass
- the user workflow works end to end
- the screenshots match the claim
- the agent used real APIs, real tools, and the right files
- there are no hidden fallbacks masking broken infrastructure
Build Evals from Real Failures
When an agent screws something up, do not just fix the code. Add a test, screenshot check, rubric, or rule that catches that class of failure next time.
You want two kinds of verification: capability checks for new work and regression checks for things that used to work.
Keep a small bank of real tasks and rerun them when the tools, skills, or models change.
Common Failure Modes
- trusting the first answer
- one-shot prompting when the task needs specs, artifacts, and review
- stale context and missing project memory
- loops, retries, and token burn without progress
- hidden fallbacks that make broken systems look healthy
- cognitive debt - if you stop understanding the code and the business, the bill comes later
Practical Habits
- capture repeatable work
- keep project memory tool agnostic
- turn failures into tests, rules, or skills
- define verification
- keep skills small
Final Principle
AI is a multiplier.
The more structure you give it, the higher the multiplier.