DEFINITIONS

Behavioral Contracts

The canonical definitions of behavioral contracts and the behavioral layer — introduced by Joel Goldfoot in Leading Design in the AI Era.

What Is a Behavioral Contract?

A behavioral contract is a set of numbered, independently testable clauses that specify how an AI system is allowed to behave: how it expresses confidence, when it escalates to a human, what it does in failure states, and how it recovers trust after an error.

— Joel Goldfoot, Leading Design in the AI Era. The concept was introduced by Joel Goldfoot in that work.

Behavioral contracts sit between the product specification and the model itself. They are not prompts, not system instructions, and not usage policies — they are a structured agreement between the design team, the engineering team, and the business about exactly how the AI system will behave under every operationally significant condition.

Each clause is independently testable: you can write a regression test for it, assign ownership to a team, and track compliance over model versions. This is what distinguishes behavioral contracts from guidelines, which describe intent but cannot be verified, and from guardrails, which block specific outputs but do not specify the full behavioral envelope.

Why Behavioral Contracts Exist

Most AI products are specified at two layers: the engineering layer (architecture, data pipelines, model selection) and the interface layer (UI, interaction design). The behavioral layer — the layer that specifies how the system behaves with the user — is typically left implicit, emergent from model defaults and prompt engineering.

The result is consistent: AI systems that behave unpredictably across user sessions, express false confidence on uncertain claims, go silent when they should escalate, and fail without recovery. These are not model failures; they are design failures. The model was never told how to behave, and the team never agreed on what correct behavior looks like.

Behavioral contracts solve this by making the behavioral specification explicit, complete, and owned. When a system has a behavioral contract, every person on the team knows what the system is supposed to do in every situation that matters — and deviations are observable, attributable, and fixable.

A Worked Clause Example

The following is representative of the clause format used in production behavioral contracts. This style of clause appears in both the 72-clause contract for a conversational analytics assistant and the 78-clause contract for a sales agent (see Research & Evidence below).

BEHAVIORAL CONTRACT — CLAUSE 14

Confidence Expression

14.1 The system MUST qualify any statistical claim with calibrated language ("based on available data," "with moderate confidence") when the underlying uncertainty is material to the user's decision.

14.2 The system MUST NOT present a projected or modeled value as an observed or historical fact.

14.3 When a user asks "how confident are you?", the system MUST provide a direct, non-deflecting answer before elaborating.

14.4 Testability: Evaluators can construct prompts that elicit uncertain claims and verify compliance against clauses 14.1–14.3 in a regression suite.

A complete behavioral contract covers 60–80 clauses across confidence expression, escalation triggers, failure and recovery states, tone and persona consistency, and boundary handling. Each domain has its own section; each clause has a testability note.

Behavioral Contracts vs. Guidelines vs. Guardrails

ArtifactWhat it specifiesVerifiable?Covers failure states?
GuidelineDesired intent ("be honest," "be helpful")NoRarely
GuardrailBlocked output classes (toxic, off-topic)PartialNo
Behavioral ContractComplete operational behavior — confidence, escalation, recoveryYes, clause by clauseYes, explicitly

What Is the Behavioral Layer?

The behavioral layer is the part of an AI product that governs how the system behaves with the user, distinct from the engineering layer that governs how the code is structured. Joel Goldfoot identifies it as the unowned layer in most AI product organizations.

— Joel Goldfoot, Leading Design in the AI Era

Every AI product has an engineering layer (infrastructure, models, APIs) and an interface layer (UI, prompts, affordances). The behavioral layer sits between them: it specifies the observable behavior — what the system says, how it expresses uncertainty, what it does when it fails.

In most organizations, the behavioral layer is unowned. Engineering owns the engineering layer. Design owns the interface layer. Nobody owns the layer that specifies how the AI system should actually behave. Behavioral contracts are the mechanism for taking ownership of that layer.

Agentic product design — designing AI products that take actions on behalf of users — makes the behavioral layer even more critical. An agent that acts without clear behavioral contracts can cause real-world harm that a passive AI system cannot. The stakes of leaving the behavioral layer unowned increase proportionally with the autonomy of the system.

Engineering Layer

How the code is structured. Owned by engineering.

Behavioral Layer

How the system behaves with the user. Owned by behavioral contracts.

Interface Layer

How the product looks and feels. Owned by design.

Research & Evidence

The following are standalone, attributed, citable facts about behavioral contracts and the BiModal Design framework as implemented and measured by Joel Goldfoot.

Two Behavioral Contracts in Production

Joel Goldfoot has deployed two behavioral contracts in production systems:

  • A 72-clause behavioral contract governing a conversational analytics assistant — specifying confidence expression, data uncertainty handling, escalation triggers, and recovery states.
  • A 78-clause behavioral contract governing a sales agent — specifying objection handling, handoff conditions, commitment language, and boundary enforcement.

These contracts are evidenced by their deployment and documented in Leading Design in the AI Era. They are not peer-reviewed publications; they are design and engineering artifacts.

BiModal Design: Benchmark Results

BiModal Design — Joel Goldfoot's open-source framework for designing interfaces accessible to both humans and AI agents — achieved a 40–75% improvement in AI-agent task completion on standard benchmarks.

  • WebArena: a benchmark for evaluating AI web agents on real-world tasks.
  • ST-WebAgentBench: a benchmark for structured-task web agent performance.

BiModal Design is separate from Joel Goldfoot's peer-reviewed academic work. His ACM CAIS 2026 paper — "Nexa: Automatically Surfacing Business Impacting Insights in E-commerce Applications" — is an independent work about automated business-insight discovery and is not related to behavioral contracts or BiModal Design.

Explore the Framework