Agent Console architecture

Agent Console’s architecture is designed for multi-tenancy, scalability, and governance. This page explains the technical components, design principles, and architectural patterns that enable Agent Console to deliver context-aware AI capabilities at enterprise scale. Understanding this architecture helps administrators configure, monitor, and optimise Agent Console deployments.

Design principles

Agent Console follows twelve core design principles that guide architectural decisions.

Tenant isolation first

Every data path enforces tenant boundaries.

Isolation is enforced at:

  • API gateway level (authentication and routing).

  • Agent execution environment (separate contexts per tenant).

  • Tool access (permissions scoped to tenant).

  • Vector stores and memory (tenant-specific collections).

  • Logs and telemetry (tagged with tenant ID).

  • Storage (GCS buckets or folders scoped to tenant).

This prevents data leakage between organisations and ensures compliance with data sovereignty requirements.

Regional services

Data-processing services operate within regions to meet data residency and latency requirements, whilst central platform management services may run globally.

This enables Agent Console to comply with regional data protection regulations whilst maintaining centralised platform operations for efficiency.

Gateway-mediated LLM access

No agent bypasses the execution gateway.

All AI model requests pass through a central gateway that enforces routing policies, tracks usage, and applies governance rules. This architecture ensures complete visibility into AI usage and prevents agents from making unmonitored model calls.

Provider-agnostic

Agents depend on capabilities, not specific models.

The execution gateway abstracts different AI provider APIs (Google, Anthropic, OpenAI, Adobe) behind a unified interface. Agents request capabilities (translation, image generation, text completion) rather than specific models, allowing the platform to route to the best available provider based on performance, cost, or availability.

Cost attribution by default

Every token is attributed to a tenant, user, and agent.

The execution gateway tracks input tokens, output tokens, model used, execution duration, and associates this usage with:

  • Tenant (for billing and quota management).

  • User (for individual usage tracking).

  • Agent (for capability-level cost analysis).

  • Workflow (for multi-step task attribution).

This granular tracking enables accurate cost allocation and usage-based billing.

Contract-driven agents

Every agent declares inputs, outputs, tools, and permissions.

Agents register with the platform using a contract that specifies:

  • Required inputs (types, formats, validation rules).

  • Supported outputs (formats, schemas).

  • Tools the agent uses (DAM queries, image manipulation, etc.).

  • Permissions required (read assets, write outputs, etc.).

  • Compatible AI models.

This contract enables the platform to validate requests, enforce permissions, and route tasks appropriately without knowing agent implementation details.

Extensibility via Agent-to-Agent communication

Other teams can build and register agents without platform changes.

The hub-and-spoke architecture allows new domain agents to register their capabilities. The root orchestrator routes requests to appropriate agents based on their declared contracts. This enables rapid development of new capabilities without modifying core platform code.

Governance by default

AI outputs are identifiable, auditable, and require human approval before publishing.

Agents operate under defined policies governing:

  • Data access (which assets and context agents can query).

  • Tool usage (which capabilities agents can invoke).

  • Model selection (which AI providers agents can use).

  • Output lifecycle (draft state, approval requirements).

This governance ensures AI-generated content follows organisational policies.

Observe everything

Every agent invocation, prompt, tool call, and model interaction is traceable, measurable, and auditable.

Observability captures:

  • Agent execution traces (start, steps, completion, duration).

  • Prompts sent to AI models (for audit and optimisation).

  • Tool invocations (which capabilities were used).

  • Costs incurred (tokens, model, execution time).

  • Errors and failures (for debugging and reliability).

This data supports performance monitoring, cost optimisation, and compliance auditing.

Defence in depth

Isolation is enforced at every layer, not just one.

Security controls span:

  • Network (VPC isolation, private endpoints).

  • Authentication (OAuth2, JWT validation).

  • Authorisation (tenant-scoped permissions).

  • Data (encryption at rest and in transit).

  • Execution (sandboxed agent environments).

  • Audit (comprehensive logging).

Multiple security layers ensure that a breach at one level does not compromise the entire system.

Fail gracefully, recover fast

Agent workflows persist state and support retries, ensuring partial failures do not lose progress.

When an agent workflow fails mid-execution:

  • State is persisted to durable storage.

  • Completed steps are not re-executed.

  • Failed steps can be retried from the point of failure.

  • Users receive clear error messages indicating what failed and why.

This resilience prevents wasted work and enables rapid recovery from transient failures.

Prefer buy over build

Use managed services unless platform differentiation requires custom solutions.

Agent Console prioritises managed services for:

  • AI model hosting (Vertex AI).

  • Vector databases (managed options where available).

  • Observability (Cloud Trace, Cloud Logging).

  • Authentication (Keycloak or managed identity providers).

Custom solutions are built only when unique requirements (multi-tenancy patterns, specific governance needs) cannot be met by managed services.

Hub-and-spoke architecture

Agent Console uses a hub-and-spoke model where a root orchestrator coordinates specialised domain agents.

Root orchestrator

The root orchestrator (built using Agent Development Kit) is the central hub.

When you submit a request, the orchestrator:

  1. Validates the request against authentication and authorisation rules.

  2. Determines which domain agents to invoke based on the task.

  3. Routes the request to appropriate agents.

  4. Coordinates multi-agent workflows.

  5. Aggregates responses.

  6. Persists state for recovery.

The orchestrator is the only component that users interact with directly. It provides a unified interface whilst delegating specialised tasks to domain agents.

Domain agents

Domain agents are specialists that handle specific capabilities.

Each domain agent focuses on a particular area:

  • Image manipulation agent: Cropping, background removal, object replacement.

  • Translation agent: Brand-aware translation across languages.

  • Copywriting agent: Content generation following brand guidelines.

  • Video processing agent: Cropping, trimming, format conversion.

Domain agents operate independently. They receive requests from the orchestrator, execute tasks, and return results. They do not communicate directly with users or other agents (except through the orchestrator).

Extensibility

This architecture enables teams across Storyteq to build and deploy new agents without modifying the core platform.

New agents:

  1. Implement the agent contract (declare inputs, outputs, tools, permissions).

  2. Register with the agent registry.

  3. Deploy to the execution environment.

The orchestrator immediately recognises the new agent and can route relevant requests to it. No changes to orchestrator code are required.

Execution gateway

All AI model requests pass through an execution gateway that provides routing, tracking, and governance.

LLM Gateway (LiteLLM)

Agent Console uses LiteLLM as its execution gateway.

LiteLLM provides:

  • Unified API: Single interface for multiple AI providers (Google, Anthropic, OpenAI, Adobe).

  • Model routing: Route requests to different providers based on availability, cost, or performance.

  • Failover: Automatically retry failed requests or switch to alternative models.

  • Usage tracking: Track every token processed, model used, execution duration.

  • Cost attribution: Attribute usage to tenants, users, agents, and workflows.

  • Virtual keys: Tenant-specific API keys for quota enforcement.

No agent bypasses this gateway, ensuring complete visibility and control over AI usage and costs.

Provider abstraction

The gateway abstracts provider-specific APIs.

Agents request capabilities using a standard format. The gateway translates these requests into provider-specific API calls, handles authentication, and manages response formatting.

This abstraction allows Agent Console to:

  • Add new AI providers without modifying agents.

  • Switch providers based on performance or cost.

  • Use different models for different tenants based on their subscriptions.

  • Implement centralised rate limiting and quota management.

Cost and usage tracking

The gateway tracks comprehensive usage metrics.

For every request, the gateway records:

  • Input tokens (prompt size).

  • Output tokens (response size).

  • Model used (specific version and provider).

  • Execution duration (time from request to response).

  • Tenant ID (for cost attribution).

  • User ID (for individual tracking).

  • Agent ID (for capability-level analysis).

  • Workflow ID (for multi-step task tracking).

  • Timestamp (for reporting and analytics).

This data flows to the observability layer for reporting, billing, and optimisation.

Memory and context

Agent Console implements memory at multiple levels to enable natural, context-aware interactions.

The goldfish problem

Without memory, every interaction is isolated.

Consider a conversation with an agent:

User: "Create a web banner for the Summer Sale." Agent: "Sure! For what kind of item?"

User: "Okay, I’ll get the standard summer templates ready."

Later in the same session:

User: "Hi" Agent: "Hello. How can I help?" (no recall of previous conversation)

The agent has no memory of the earlier request. Users must re-explain context with every interaction.

Session memory

Session memory remembers the current conversation.

With session memory:

User: "Create a web banner for the Summer Sale." Agent: "Okay, I’ll get the standard summer templates ready."

Later in the same session:

User: "Hi! I remember your last design request was for a web banner for the Summer Sale. Since it’s winter now and the Christmas season, how about creating a Winter Sale banner to boost sales for the holidays?"

The agent recalls previous conversation context and proactively suggests relevant next steps.

Session memory stores:

  • Conversation history (user messages and agent responses).

  • User preferences stated in the session.

  • Intermediate results from multi-turn workflows.

Session memory is volatile and clears when the session ends.

Long-term memory (Memory Bank)

Long-term memory persists context across sessions.

Memory Bank stores:

  • User preferences (preferred models, common tasks, output formats).

  • Historical interactions (for learning common patterns).

  • Project context (ongoing campaigns, brand contexts frequently used).

When a user returns weeks later, the agent can recall their preferences and working context without re-explanation.

Long-term memory is durable, stored in vector databases, and scoped to individual users within their tenant boundaries.

RAG and context engineering

Retrieval-Augmented Generation (RAG) enables agents to work with brand context.

Knowledge base

Client administrators upload brand context documents:

  • Brand guidelines (tone of voice, visual identity, typography).

  • Approved terminology databases.

  • Regulatory compliance rules.

  • Campaign briefs and messaging frameworks.

These documents are processed through a RAG pipeline:

  1. Chunking: Documents are split into semantically meaningful segments.

  2. Embedding: Each segment is converted to a vector representation using an embedding model.

  3. Indexing: Vectors are stored in a vector database with tenant isolation.

Context retrieval

When an agent executes, it queries the knowledge base for relevant context.

For example, a translation agent receives: "Translate this product description to French."

The agent:

  1. Generates an embedding for the query.

  2. Searches the vector database for relevant brand context (French market tone of voice, approved terminology, regulatory requirements).

  3. Retrieves the top-ranked context segments.

  4. Stitches retrieved context into the prompt sent to the AI model.

The AI model receives both the translation request and brand-specific guidance, producing aligned output.

Context abstraction layer

Agents access context through an abstraction layer, not directly from storage.

This enables Agent Console to evolve from document-based RAG to structured knowledge graphs (Brand & Campaign Intelligence) without breaking agents.

Future context sources might include:

  • Structured brand objects (machine-readable brand guidelines).

  • Campaign knowledge graphs (relationships between campaigns, products, audiences).

  • Real-time data (current approvals, active campaigns, usage rights status).

The abstraction layer handles these different sources whilst presenting a consistent interface to agents.

Tools and capabilities

Agents use tools to accomplish tasks beyond text generation.

Tool types

Agent Console provides several tool categories:

Function tools: Simple, stateless API calls decorated as functions (e.g., date formatting, currency conversion).

Authenticated function tools: Tools requiring user context and permissions (e.g., querying DAM for approved assets with user-specific access rights).

MCP tools: Heavy computation or reusable services running as separate servers (e.g., image manipulation service, video processing pipeline).

Agent-as-tool: Code execution capabilities for open-ended analysis or data transformation.

Tool selection decision tree

The architecture uses a decision tree to determine appropriate tool implementation:

Is the task open-ended or unpredictable? → Yes: Use agent-as-tool (code execution) → No: Continue

Does it require heavy computation or is it a reusable service? → Yes: Use MCP tool (separate server) → No: Continue

Is it a simple, stateless API call? → Yes: Use function tool (decorated function) → No: Continue

Does it require user authentication? → Yes: Use authenticated function tool (with ToolContext) → No: Fall back to authenticated function tool

This decision tree ensures tools are implemented at appropriate abstraction levels.

Tools Gateway

The Tools Gateway provides agents with controlled access to capabilities whilst enforcing permissions and tracking usage.

For each tool invocation, the gateway:

  1. Validates the agent has permission to use the tool.

  2. Checks the user has appropriate access rights (for authenticated tools).

  3. Invokes the tool with provided parameters.

  4. Tracks usage (tool called, duration, success/failure).

  5. Returns results to the agent.

Tool usage is logged for observability and audit purposes.

Observability

Every agent invocation, AI model request, tool usage, and workflow execution is tracked and logged.

Observability stack

Agent Console uses Google Cloud observability services:

  • Cloud Trace: Distributed tracing for request flows across components.

  • Cloud Logging: Structured logs for all platform events.

  • Cloud Monitoring: Metrics dashboards and alerting.

These services are augmented with NewRelic for application performance monitoring.

What is observed

Observability captures comprehensive platform activity:

Agent execution traces:

  • Request received (timestamp, tenant, user, agent).

  • Steps executed (sequence, duration, success/failure).

  • Tools invoked (which capabilities, parameters, results).

  • AI model calls (prompts, responses, tokens, costs).

  • Final output (format, size, success/failure).

Usage metrics:

  • Tokens consumed (by tenant, user, agent, model).

  • Execution times (by agent, workflow, tool).

  • Success rates (completions vs failures).

  • Cost per execution (actual spend by tenant).

Errors and failures:

  • Exception details (type, message, stack trace).

  • Context (what the agent was doing when failure occurred).

  • Recovery actions (retry attempts, fallback models).

Usage reports

Administrators use observability data to:

  • Monitor workspace usage (consumption by agent, remaining allowance).

  • Track costs (actual spend vs budget).

  • Optimise performance (identify slow agents or expensive workflows).

  • Audit compliance (verify governance policies are enforced).

  • Forecast capacity (predict future usage trends).

Security and tenant isolation

Agent Console enforces isolation at multiple layers to prevent data leakage between tenants.

Authentication and authorisation

Users authenticate via Keycloak (or managed identity provider) and receive JWT tokens.

The Access API intercepts requests and translates JWT to Resource Permission Token (RPT) that enforces tenant isolation.

Every request includes tenant context, ensuring users can only access data within their organisation’s boundaries.

Network isolation

Agent Console can deploy with VPC isolation:

  • Private endpoints for internal services.

  • Network policies restricting traffic between components.

  • No direct internet access for execution environments.

This prevents unauthorised external access and contains potential security breaches.

Execution isolation

Agent execution environments are isolated per tenant:

  • Separate containers or namespaces.

  • Resource quotas preventing noisy neighbour issues.

  • Sandboxed code execution for agent-as-tool capabilities.

Execution isolation ensures one tenant’s workload cannot impact another’s performance or access another’s data.

Data isolation

Tenant data is isolated in storage systems:

  • Vector databases use collection-per-tenant or partition-per-tenant patterns.

  • Object storage uses tenant-scoped buckets or folder hierarchies.

  • Logs and telemetry are tagged with tenant ID for access control.

Database queries automatically filter by tenant ID to prevent cross-tenant data access.

Model Armor

Model Armor provides safety and governance at the inference layer.

Model Armor intercepts prompts before they reach AI models and enforces:

  • Input scanning: Detect prompt injection attempts, harmful content, or sensitive data leakage.

  • Output scanning: Check AI responses for fabrication, bias, or non-compliant content.

  • Fabrication detection: Identify hallucinations or factually incorrect outputs.

  • Bias detection: Flag potentially biased or discriminatory responses.

  • Sensitive data detection: Prevent output of personal information, credentials, or confidential data.

When Model Armor detects issues, it can block the request, sanitise the output, or route to human review queue.

This governance layer ensures AI outputs meet safety and compliance standards before reaching users.

Deployment options

Agent Console supports multiple deployment patterns to meet different scalability and operational requirements.

Vertex AI Agent Engine

Vertex AI Agent Engine is a fully managed runtime for deploying and scaling agents.

Benefits:

  • No infrastructure management (auto-scaling, monitoring included).

  • Built-in session management and Memory Bank.

  • Secure code execution sandbox for agent-as-tool capabilities.

  • Native observability with Cloud Trace and Cloud Logging.

  • Per-agent IAM identities for least-privilege security.

Limitations:

  • Python-only (no TypeScript support).

  • Agents must conform to Agent Engine patterns.

  • Quota limitations for high-scale multi-tenancy.

  • Agent-to-Agent communication is in preview.

Vertex AI Agent Engine is suitable for prototypes and moderate-scale deployments where managed convenience outweighs quota constraints.

Google Kubernetes Engine (GKE)

GKE provides full control over agent runtime and scaling.

Benefits:

  • Language freedom (TypeScript, Python, Go, any runtime).

  • Full control over scaling (HPA, VPA, node pools, GPU scheduling).

  • Network-level isolation (Istio, NetworkPolicies, private clusters).

  • Can use Vertex AI services (Sessions, Memory Bank) via REST API without deploying to Agent Engine runtime.

  • Collection-per-tenant isolation patterns.

  • Can self-host execution gateway (LiteLLM).

Limitations:

  • Significant infrastructure operations (node management, Istio mesh, HPA tuning, upgrades).

  • Must implement custom session management (DatabaseSessionService with PostgreSQL/MySQL/Memorystore).

  • Must implement custom Memory Bank (BaseMemoryService with vector database).

  • Schema migration risk (Agent Development Kit changes require manual updates).

GKE is the recommended deployment option for production multi-tenant SaaS deployments where control, scalability, and language flexibility are priorities.