2. Building an Agent
This chapter covers
- Agents and Environments
- Goals and plans
- Autonomy and Alignment
- Context Engineering
- The Desktop Warrior
Building Agents? Learn more about Resonate's Distributed Async Await, a dead simple programming model for building reliable agentic applications.
Join the Discord
Your desktop is spotless. Completely empty, adorned by a carefully curated wallpaper highlighting your personality. Except for one folder: "Desktop". Inside Desktop, digital chaos reigns: old screenshots, old documents, hastily cloned github repos, never deleted. And, of course, another folder called "Desktop". The battle for order was lost long ago to Desktop's recursion depth.
In this chapter, we’ll build a local AI agent to fight that battle. This Desktop Warrior is simple enough to follow, yet complex enough to reveal the patterns we will encounter when building ambitious, distributed agentic applications.
Figure 2.1: The Desktop Warrior, interacting with the user and the file system
2.1 Agents in their Environments
Systems engineering depends on accurate and concise mental models to reason about complex systems. Before we tackle the Desktop Warrior, we need the mental models to reason rigorously about abstract concepts like autonomy and alignment, as well as fuzzy concepts like prompt engineering.
Complex systems often contain aspects that resist precise definition. When does an agent make the best decision? When is a desktop well organized? How can we reason rigorously about systems when key concepts remain undefined?
Throughout this chapter, we make use of uninterpreted functions—functions with defined interfaces but unspecified implementations. We will use two types of uninterpreted functions: one that maps its arguments onto a boolean value, and another that maps its arguments onto a numerical value. Boolean values answer simple yes or no questions, while numerical values allow us to compare different outcomes.
For example, to express preferences, we simply postulate a fitness function U that maps inputs to numerical values, where higher values indicate better outcomes:
-
Preference—If
Umaps a desktop without screenshots to a higher value than a desktop with screenshots, then a desktop without screenshots is preferable.U(Desktop[without screenshots]) > U(Desktop[with screenshots]) -
Equivalence—If
Umaps a desktop without screenshots to the same value as a desktop with screenshots, thenUdoesn't express a preference, they are equivalent.U(Desktop[without screenshots]) = U(Desktop[with screenshots])
By separating the interface (what we need for reasoning) from the implementation (details we may not know or care about), we concentrate all uncertainty into U. This makes the rest of our model crisp: we can reason rigorously about agent behavior without defining what "better" actually means.