TL;DR — What Makes an Agent
The "AI wrapper" era is ending. In 2023, you could ship a ChatGPT wrapper and call it a product. In 2026, users expect AI systems that actually do things — research a topic across multiple sources, write and execute code, book a meeting, file a support ticket, update a database record. That's agentic behavior.
Building agents isn't dramatically harder than building a chat feature. But it requires a different mental model. You're not building a request-response endpoint. You're building a loop — a system that reasons, acts, observes the result, and reasons again.
What Actually Makes Something an Agent
The word "agent" gets used loosely. Here's a precise definition: an agent is an LLM system that can take actions in the world based on reasoning, observe the results of those actions, and use those observations to decide what to do next — in a loop — until a goal is achieved.
By this definition, a basic chatbot is not an agent. It takes one input, produces one output. An agent with web search is borderline — it can call one tool, but if it can't reason about the search results and decide to search again with a refined query, it's still just a fancy chatbot.
The model that decides which tool to call, what arguments to pass, whether the result was sufficient, and when to stop. This is where your model choice matters most — tool-calling quality varies significantly between models.
Structured JSON schemas that describe what the agent can do. The model selects tools and generates arguments; your code executes them. Tools are the hands of the agent.
Short-term memory is the conversation buffer — recent messages that fit in the context window. Long-term memory is a vector database: embeddings of past interactions and documents the agent can retrieve semantically.
The agent loop runs: reason → act → observe → reason. It exits when the LLM signals completion (a "finish" tool call or a FINAL_ANSWER marker), or when a hard iteration limit is hit. Never ship an agent without a max-iterations guard.
The 3 Agent Patterns
Not all agents are built the same way. There are three dominant architectural patterns, each with different trade-offs:
The model reasons about what to do, calls a tool, observes the result, then reasons again. Simple, interpretable, and works well for most tasks. The loop continues until the model emits a final answer. This is the pattern you should start with.
A planning LLM first generates a step-by-step plan. An execution LLM then executes each step. This separates strategic thinking from tactical execution, which improves quality on long-horizon tasks but adds latency and cost.
An orchestrator agent routes subtasks to specialized agents — a research agent, a writing agent, a code agent. Each specialist has its own tool set and context. This scales to very complex problems but adds significant coordination overhead.
Building a ReAct Agent with Next.js
The Server Action approach works well for agents in Next.js — it keeps all LLM logic server-side (your API key never touches the client), and you can stream the response back using the AI SDK.
Defining Tools
// src/lib/agent/tools.ts
import { tool } from 'ai';
import { z } from 'zod';
export const tools = {
search_web: tool({
description: 'Search the web for current information on a topic.',
parameters: z.object({
query: z.string().describe('The search query'),
}),
execute: async ({ query }) => {
const results = await fetchSearchResults(query);
return results.slice(0, 5).map(r => ({
title: r.title,
url: r.url,
snippet: r.snippet,
}));
},
}),
query_database: tool({
description: 'Query the application database for user or product data.',
parameters: z.object({
table: z.enum(['users', 'orders', 'products']),
filter: z.record(z.string()).optional(),
limit: z.number().min(1).max(50).default(10),
}),
execute: async ({ table, filter, limit }) => {
return await db.select(table, { where: filter, limit });
},
}),
};The Agent Loop (Server Action)
// src/app/actions/agent.ts
'use server';
import { streamText } from 'ai';
import { google } from '@ai-sdk/google';
import { tools } from '@/lib/agent/tools';
import { createStreamableValue } from 'ai/rsc';
const SYSTEM_PROMPT = `You are a helpful AI assistant with access to tools.
Think step by step. Use tools when needed. When you have a final answer,
respond directly without calling any more tools.`;
export async function runAgent(messages: Message[]) {
const stream = createStreamableValue('');
(async () => {
const result = await streamText({
model: google('gemini-2.0-flash'),
system: SYSTEM_PROMPT,
messages,
tools,
maxSteps: 10, // Hard limit: never run more than 10 tool calls
onStepFinish({ text, toolCalls, toolResults, finishReason }) {
// Log for observability
console.log('[agent step]', { toolCalls: toolCalls?.length, finishReason });
},
});
for await (const delta of result.textStream) {
stream.update(delta);
}
stream.done();
})();
return { output: stream.value };
}maxSteps parameter limits how many tool call + response cycles can happen before the model is forced to produce a final answer. Set it low (5–10) for most agents. An agent that needs 20+ steps to answer a question is a design problem, not a capability problem.Adding Memory
An agent without memory starts fresh on every interaction. That works for one-shot tasks. For anything that spans multiple sessions — a personal assistant, a customer support agent that knows your history, an onboarding assistant — you need both memory tiers.
Short-Term: Conversation Buffer
Short-term memory is just the message array you pass to the model. Truncate it when it exceeds the context window, keeping the system prompt and most recent N exchanges. A simple approach: keep the last 20 messages. A better approach: summarize old messages into a compressed context block.
Long-Term: Vector Search
// src/lib/agent/memory.ts
import { embed } from 'ai';
import { google } from '@ai-sdk/google';
// Store a memory
export async function storeMemory(userId: string, content: string) {
const { embedding } = await embed({
model: google.textEmbeddingModel('text-embedding-004'),
value: content,
});
await db.insert('agent_memories', {
user_id: userId,
content,
embedding,
created_at: new Date(),
});
}
// Recall relevant memories
export async function recallMemories(userId: string, query: string, limit = 5) {
const { embedding } = await embed({
model: google.textEmbeddingModel('text-embedding-004'),
value: query,
});
// Vector similarity search (pgvector, Pinecone, Weaviate, etc.)
return await db.vectorSearch('agent_memories', {
embedding,
filter: { user_id: userId },
limit,
});
}Before each agent run, retrieve the top-K relevant memories for the current query and inject them into the system prompt as context. This gives the agent "personalized knowledge" without blowing up the context window with the full history.
Tools That Agents Actually Need
The tool set defines what your agent can do. Here are the core tool categories and what to think about for each:
Production Concerns
Demos run cleanly. Production agents hit timeouts, exhaust budgets, encounter malformed tool outputs, and get stuck in loops. Here's what to build for before launch:
Set both a wall-clock timeout (30–60s for most user-facing agents) and a max-steps limit. Route Handlers have a default 30s timeout in Vercel — if your agent might run longer, use background jobs (Inngest, Trigger.dev, QStash) and push updates via streaming or webhooks.
Count tokens before and after each agent run. Set per-user and per-organization daily token budgets. Log every LLM call with token counts. Agents are multiplicative — a 10-step agent run can cost 10x a single completion. Instrument before you get surprised by your first bill.
Tools will fail. Your search API will rate-limit. Your database will timeout. Define a standard error format for all tools and instruct the model in the system prompt on what to do when a tool fails: try once more, use an alternative tool, or ask the user for clarification.
You cannot debug an agent you cannot observe. Log every step: which tool was called, what arguments, what the response was, how long it took. Use LangSmith, Langfuse, or a custom logging layer. You'll need this trace the first time a user files a "the AI did something wrong" bug report.
The Real-World Agent Architecture
Here's the full system picture for a production-grade agent in Next.js:
Each layer has a single responsibility. The Server Action handles authorization. The memory layer handles context injection. The agent loop handles reasoning. The tool executor handles action. The observability layer handles debugging. Keep them separate — not in one giant function.
Conclusion
Building an AI agent with Next.js is primarily an architectural problem, not a model problem. The LLM reasoning is the easy part — the major models handle tool calling well. The hard part is the infrastructure around it: memory systems, tool sandboxing, cost controls, timeout handling, and observability.
Start with the ReAct pattern and a small tool set. Get one end-to-end flow working with logging and a max-iterations guard. Then add memory, then additional tools, then — if you need it — a more complex multi-agent architecture. The mistake is starting with a complex architecture before the basics work.
The market right now is full of LLM wrappers. What it needs is agents that actually work in production — that handle failure, respect budgets, and produce results users can trust. Build that and you're in a different category entirely.
