← References

Agentic Engineering: Practical Guide

This is a working reference for getting real value out of AI agents.

Not theory. Not prompts.

This is about building a system that actually works in a development environment.


The Goal

Get more value out of AI.

In practice, that means:

  • less rework
  • less repetition
  • fewer “almost correct” outputs
  • faster iteration
  • more predictable results

The Core Idea

Models do not carry reliable working memory from one session to the next.

If you want continuity, you have to engineer it.

That structure comes from:

  • skills
  • design documents and ADRs
  • tool-agnostic AI artifacts
  • specialized agents
  • rules
  • examples
  • verification

The Mental Model

Stop thinking in prompts.

Start thinking in systems.

A good system includes:

  • reusable skills - the "how to" dictionary
  • persistent artifacts (docs, ADRs, patterns) - the "why we do it this way" dictionary
  • specialized agents - your "team members"
  • clear task framing
  • strong review loops

Engineer the Memory

Keep the durable AI artifacts in plain markdown, checked into git.

Usually that means an ai/ or .ai/ directory.

Then add thin connective tissue for the specific tools you use. Cursor. Claude Code. Whatever comes next.

That way the memory belongs to the project, not to a single AI tool.

One useful habit: ask the agent to review its memory and update the tool-agnostic skills and rules so the next agent starts stronger.

What a “Skill” Is

A skill is a reusable unit of work.

It tells the agent how to do something properly in your environment.

Examples:

  • how to add a field across DB + API + UI
  • how to write a GraphQL resolver
  • how to run an integration test
  • how to debug in the browser
  • how to investigate a Jira ticket

When to Create a Skill

Two main signals:

1. The task is repeatable

If you’re doing something more than once, it should probably be a skill.

2. The agent is struggling

If the agent is:

  • trying multiple approaches
  • taking too long
  • producing inconsistent results

That’s a missing skill.

Fix it once. Capture it.

How to Build Skills

  1. Do the task once
  2. Capture the steps
  3. Add constraints and patterns
  4. Store it as a skill
  5. Reference it next time

Practical Tips

There are lots of skills out there to take inspiration from. A quick search will often help you sharpen yours.

Use AI to help build your skills, but review carefully and don't let AI slop in. Mistakes in skills get amplified. Keep them short and specific to reduce context window and token costs.

Types of Skills

Direct Skills

Common repeatable work (New Pages, DB schema updates and API extensions, releases)

These skills can often spell out tests and typically deliver a feature, fix or process.

Callable Skills

Composable units (browser debugging, Playwright, DB access)

Meta Skills

Skills that improve how other skills are used, such as a skill for creating new skills or a task kickoff skill for writing better specs.

Writing Specs

A spec is a detailed description of what needs to be done. For complex work, a lot of leverage comes from the spec. The better the spec, the less cleanup you do later.

A good spec includes:

  1. Reference relevant skills and design documents
  2. Define the expected output
  3. Define the tests and verification
  4. Break large tasks into smaller, gated steps
  5. Assign substeps to specialized agents (designer, tester, developer, etc.).
  6. Define an orchestrator agent that coordinates the specialized agents and ensures the task is completed.

Practical Tips

Make screenshots part of the deliverables of the task. This forces the agent to run the resulting code and helps to avoid the all-too-common scenario when the agent says it's done and it won't even run.

Agents think in "code" and not "app" and struggle with long horizon perspectives on how the app is actually used. Include end to end workflow that spell out each step of using the new feature and ask the agent to deliver a Playwright test.

You will think you have a tight spec when you kick off the agent. You'll discover you've missed stuff. Make sure the agent knows to check in when unexpected "discoveries" are made.

Don't skimp on reviewing the spec. You'll be building the spec collaboratively with the agent and AI slop can creep in all too easily.

Verification Is the Work

This is the part people skip.

Code generation is cheap. Verification is the bottleneck.

Do not only verify the final answer. Verify the path the agent took to get there.

What to Verify

  • the app runs
  • the tests pass
  • the user workflow works end to end
  • the screenshots match the claim
  • the agent used real APIs, real tools, and the right files
  • there are no hidden fallbacks masking broken infrastructure

Build Evals from Real Failures

When an agent screws something up, do not just fix the code. Add a test, screenshot check, rubric, or rule that catches that class of failure next time.

You want two kinds of verification: capability checks for new work and regression checks for things that used to work.

Keep a small bank of real tasks and rerun them when the tools, skills, or models change.

Common Failure Modes

  • trusting the first answer
  • one-shot prompting when the task needs specs, artifacts, and review
  • stale context and missing project memory
  • loops, retries, and token burn without progress
  • hidden fallbacks that make broken systems look healthy
  • cognitive debt - if you stop understanding the code and the business, the bill comes later

Practical Habits

  • capture repeatable work
  • keep project memory tool agnostic
  • turn failures into tests, rules, or skills
  • define verification
  • keep skills small

Final Principle

AI is a multiplier.

The more structure you give it, the higher the multiplier.

1 / 5