Skip to content
Foundations

AI Agent Architecture, Explained for Decision-Makers

Orange ITS — AI engineering team 7 min read

A vendor sends you a proposal. The deck shows a diagram with boxes and arrows: an LLM in the middle, some databases on the side, a few API icons. It looks complete. But the diagram doesn’t tell you whether the system will behave reliably under load, what happens when an API goes down, how much each conversation costs to run, or whether you’ll be able to switch providers in two years.

Those answers live in the architecture. And you don’t need to be an engineer to ask the right questions about it.

This article translates AI agent architecture into the terms that matter for procurement: reliability, operating cost, and vendor dependency. Think of it as a checklist for auditing a vendor’s design before you sign.


The Four Pillars Every AI Agent Architecture Has (Whether the Vendor Labels Them or Not)

A production-grade AI agent is built from four logical components. Vendors name them differently, combine them in various ways, and sometimes bundle them inside a single platform you can’t see inside. But they’re always there.

1. The Planner (the “thinking” layer)

The planner is the large language model (LLM) at the core of the agent. It receives a goal or a user message, reasons about what to do next, and decides which tools to call. Some architectures use a single LLM call per step; others run multiple rounds of reasoning before acting.

What to ask a vendor:

  • Which LLM model powers the planner, and how is it updated? (A model upgrade that changes behavior without notice is a reliability risk.)
  • Is the planner deterministic or probabilistic? Can you set confidence thresholds or fallback conditions?
  • How is the planner prompted? Do you have any visibility into or control over the system prompt?

2. Tools (the “hands” of the agent)

Tools are the connections to the outside world: database queries, API calls, form submissions, file reads, email sends. An agent without tools is just a chatbot. The tools layer is where agents actually do things.

The number, reliability, and scope of tools determine what the agent can accomplish — and what it might do by accident. A tool that can send emails is useful; one that can send emails to anyone without an approval step is a liability.

What to ask a vendor:

  • Which tools does the agent have access to, and can that set be restricted?
  • Are tool calls logged in an auditable way?
  • What happens when a tool call fails — does the agent retry, escalate to a human, or silently fail?

3. Memory (what the agent remembers)

Memory in AI agent architecture is more nuanced than it sounds. There are at least three distinct types:

  • Short-term memory: the current conversation context, held in the LLM’s context window. It resets when the session ends.
  • Long-term memory: facts stored in a database and retrieved when relevant — customer history, product knowledge, prior interactions.
  • Procedural memory: rules, workflows, and patterns baked into the agent’s instructions or fine-tuning.

Memory design has a direct impact on user experience and cost. Agents with well-designed long-term memory feel coherent and context-aware. Agents relying purely on short-term memory re-ask questions users already answered.

What to ask a vendor:

  • What memory types does the architecture include?
  • Where is customer data stored, and under which jurisdiction? (For Swiss businesses, this matters for nFADP compliance — see our article on AI Agents and Swiss Data Protection.)
  • Can long-term memory be cleared or corrected if it stores incorrect information?

4. Guardrails (the safety and control layer)

Guardrails are the constraints that keep the agent on-task and within acceptable behaviour. They operate at several levels: input filtering (blocking harmful or out-of-scope requests), output validation (checking responses before they’re sent), scope limits (preventing the agent from taking actions outside its defined role), and escalation logic (knowing when to hand off to a human).

This is the component most often underspecified in vendor demos. A demo works because the vendor controls the inputs. Production works because the guardrails hold when inputs are unpredictable.

What to ask a vendor:

  • What happens when a user tries to make the agent do something outside its scope?
  • Is there a human-in-the-loop option for high-stakes actions (large transactions, sensitive data access)?
  • How are guardrail rules updated as the business evolves?

How Architecture Choices Drive Your Operating Costs

AI agent costs aren’t just a licence fee. The architecture determines a significant portion of the per-interaction cost, which compounds at scale.

The main cost driver is LLM token consumption. Every message in the context window costs money. An agent that passes the full conversation history on every step — rather than using structured long-term memory — burns tokens linearly with conversation length. For a company handling hundreds of interactions per day, the architectural choice locks in the cost curve.

Other cost drivers worth examining:

  • Tool call frequency: more tool calls per interaction means more latency and sometimes additional third-party API costs.
  • Model tier: some architectures use expensive frontier models for every step; others reserve those for complex reasoning and use lighter models for simpler subtasks (a pattern sometimes called a “model cascade”).
  • Retry logic: an agent that retries failed tool calls without circuit-breaker logic can multiply costs during outages.

Vendor Lock-In Lives in the Architecture, Not the Contract

The most overlooked procurement risk in AI agent projects is architectural lock-in. A contract can include exit clauses; an architecture cannot always be migrated cheaply.

Lock-in typically accumulates in three places:

Proprietary memory stores: if the agent’s long-term memory is held in a vendor-specific database format, migration means re-importing and re-validating all historical context — often months of accumulated data.

Hard-coded tool integrations: agents built directly on a vendor’s SDK using proprietary tool-calling APIs require rewriting when you move platforms. Compare this to agents built on open standards like the Model Context Protocol (MCP), where tools are more portable. We cover this in more depth in AI Agent Platform Lock-In: The Risks Nobody Prices In.

Model dependency: if the planner logic was heavily prompt-engineered around one specific model’s quirks (say, a model that’s since been deprecated or significantly updated), migrating to a different LLM may require re-tuning the entire system.

The safest architectures keep these layers loosely coupled: the planner can swap models; the memory layer uses standard retrieval patterns; the tools connect via documented APIs or open protocols. This isn’t always achievable on a budget or timeline, but it’s the question to ask.


A Practical Audit Checklist Before You Commit

When evaluating an AI agent proposal, these questions surface design quality faster than any demo:

Reliability

  • What is the failover behaviour when the LLM API is unavailable?
  • How does the agent handle ambiguous or conflicting inputs?
  • Is there a mechanism to detect and break infinite loops or runaway tool chains?

Cost

  • What is the estimated per-interaction token cost, and how does it scale with conversation length?
  • Does the architecture use a single model for all tasks, or a cost-tiered approach?

Lock-in

  • Which components are proprietary, and which use open standards?
  • Where is data stored, and in what format is it exportable?
  • If you switched the LLM provider, what would need to be rewritten?

Control

  • What guardrails are in place, and who can modify them?
  • Is there a full audit log of tool calls and agent decisions?

When Architecture Reviews Matter Most

Not every AI agent project warrants a deep architectural review at procurement. A simple FAQ chatbot with no tool access and no data persistence is low-risk regardless of how it’s built.

The architectural stakes rise with:

  • Tool access to sensitive systems (CRM, ERP, financial data, customer records)
  • High interaction volume where cost-per-interaction compounds
  • Multi-step autonomous workflows where a bad early decision propagates
  • Regulatory exposure — anything touching personal data under GDPR, the Swiss nFADP, or both — many Swiss businesses are subject to both regimes simultaneously

If your use case sits in any of these categories, understand the architecture before the contract, not after. For a broader look at good agent design, see AI Agent Orchestration: Making Agents Work as a System and What Are AI Agents? A No-Hype Guide for Business Leaders.

The build-vs-buy decision is also shaped by architecture: platforms trade configurability for speed; custom builds trade speed for control. Build vs Buy: A Decision Framework for AI Agents examines that trade-off directly.


What a Good Architecture Review Looks Like

At Orange ITS, when we assess an AI agent project — whether we’re building it or reviewing an existing system — we map the four components explicitly before writing a line of code. That means defining the planner’s model and fallback behaviour, specifying which tools the agent can invoke and under what conditions, designing the memory model for both performance and data residency, and writing the guardrail logic before the happy path.

It’s not glamorous. It’s also the reason that systems built this way don’t surprise their owners six months in with spiralling API bills, compliance questions, or a vendor saying migration will cost more than the original build.

If you’re evaluating an AI agent project and want a second opinion on a vendor proposal — or a clear-eyed view of what architecture fits your use case — a 30-minute call with our team is the fastest way to get there. We’ll map the four components against your requirements and flag where the design holds and where it carries hidden risk.

Insights

Put these ideas to work

A 30-minute call is enough to find out whether an AI agent fits your workflow — and what it would return.