Discussions around agent infrastructure have produced many different directions in the industry, such as agent social networks, agent payments, agent identity, and agent memory. Most of these directions are framed from the perspective of specific application scenarios. However, there is relatively little research on what this layer is actually for.
To investigate this question, we ran a series of ablation experiments grounded in game theory. We decomposed identity, memory, and payment into independently toggleable components, which we collectively call the agent institutional layer. The results reveal several clear patterns:
- The institutional layer can substantially change how an agent system behaves without modifying model weights. In the Prisoner's Dilemma, adding the full institutional layer raised the final-round cooperation rate1 from 2.8% to 50.6%.
- Identity and the enforcement mechanism are the two most critical components, and they are interdependent.
- Identity alone caused agents to coordinate more effectively on defection, lowering cooperation by 2.2 percentage points.
- Enforcement alone led agents to punish based on incomplete information, producing a 15.4% rate of mistaken punishments; adding identity reduced this to 0.7%.
- The task environment itself strongly modulates the value of the institutional layer. When agents' interests are largely aligned, natural language coordination already produces high cooperation, and the institutional layer adds little. Once interests diverge, however, the institutional layer becomes essential.
These findings suggest that the agent institutional layer is the core of agent infrastructure. Identity functions as a record of each agent's past behavior, while the enforcement mechanism allows agents to experience the consequences of their actions. Together they form a closed loop that pairs an attributed record of behavior with grounded feedback, which is what a reinforcement learning environment is made of.
Introduction
Everyone expects that capable agents will eventually handle real tasks in daily life, such as negotiating deals, managing relationships, and carrying out transactions on our behalf. Yet when we examine whether today's agents can operate reliably in complex, high-stakes settings, they still fall noticeably short. Consider a straightforward example.
Suppose we replace a company's sales and procurement teams entirely with agents. Company A's sales agent is instructed to hold the price at $100, while Company B's procurement agent is told to get the price below $80. Two rational but narrowly scoped agents would likely reach a quick stalemate. Experienced human negotiators, by contrast, might find a creative middle ground that leaves both sides better off, such as a two-year contract priced at $80 in the first year and $100 in the second.
Humans are able to reach such arrangements largely because they rely on signals that exist outside language. Years of prior cooperation create trust that makes both parties willing to commit to a multi-year deal. Subtle shifts in tone, expression, and pacing during the conversation allow real-time adjustments. These signals are essential to successful negotiation, yet they are absent from the context given to agents and cannot be fully captured in a prompt. Trust and rapport emerge only through repeated real-world interaction; they cannot be pre-installed.
Many organizations therefore conclude that important work should remain with humans for now, while we wait for models to become more capable. We believe, however, that these limitations are difficult to overcome through advances in model capability alone.
Current agents are LLM agents. Their inputs and outputs are natural language, which is inherently vague and ambiguous. The factors that often determine the outcome of a negotiation lie outside language, including long-term trust, reputation, and real-time social cues. These signals are hard to acquire simply by improving an agent's natural language understanding or reasoning ability.
Human societies face a similar constraint: we communicate primarily through language, yet we have developed an entire layer of mechanisms beyond language to enable large-scale coordination. Credit scores, legal sanctions, wages, professional reputation, and social norms all serve this purpose. These mechanisms allow humans to achieve levels of cooperation and accountability that language by itself could never sustain.
Research on human institutions has shown that long-term cooperation requires two core elements: monitoring of behavior and sanctions for violations.2 When mapped to agents, these correspond to identity, which is a record of past behavior, and an enforcement mechanism, which is a way to impose consequences. This is the gap we set out to explore: what changes when natural-language agents are given identity and an enforcement mechanism?
Experimental Design
We structured the experiment in three parts. First, we defined the core primitives for identity and the enforcement mechanism. We then built a 2×2 ablation framework around these primitives. Finally, we selected appropriate game environments. Below we describe how the experiment was designed.
Defining Primitives for Identity and Enforcement
When reviewing existing approaches to agent identity, such as W3C decentralized identifiers and runtime credentials like OAuth and SPIFFE, we noticed that most define identity primarily through a unique ID. A bare identifier is useful for identification, but we found that on its own it is too thin to support meaningful behavioral understanding or accountability.
We therefore took a different approach. In human contexts, identity is not defined by a number or label, but by a behavioral record. A passport number does not capture who someone is; their history of actions does. People also maintain multiple contextual identities, because different groups have observed different aspects of their behavior over time. On this basis, we defined agent identity as a personalized behavioral record.
To implement this idea, we designed identity around three primitives that together form such a record: a commitment record of what the agent promised, an action attribution of what it actually did and whom it is attributed to, and a decision basis capturing the reasoning behind its action.
For the enforcement mechanism, we initially considered a simple monetary transaction channel, since money is the most familiar incentive system in human society. However, we quickly realized that real-world incentives take many non-monetary forms as well, such as gains or losses in credit score or professional reputation, and these incentives are typically tied to specific business objectives.
This led us to view the enforcement mechanism more broadly. Rather than focusing only on settlement, we designed it as a general channel that can attach consequences to any observable objective. For example, higher DAU can generate monetary returns, and consistent commitment fulfillment can improve a credit record. This approach allows the mechanism to support money, credit systems, reputation, honor, and other emerging forms of incentive.
In this study, we instantiated the enforcement mechanism as a sanction channel based on cumulative score. An agent could penalize another for perceived violations. The punisher incurred a small cost of approximately 5% of its own gain, and the punished agent lost a larger amount of approximately 15%.3
Running the Ablation with a 2×2 Framework
We identified identity and the enforcement mechanism as the two most fundamental variables. By making each component independently toggleable, we created a clean 2×2 ablation framework. The four resulting combinations correspond to the following experimental conditions.
- Condition A, the state of nature, turns both components off. Agents communicate only through natural language, keep no queryable record, and face no consequence for breaking an agreement. This is the default state of most multi-agent systems today.
- Condition I adds identity only. Each agent gains a ledger of its own and others' past behavior, but breaking an agreement still carries no consequence.
- Condition E adds enforcement only. An agent can sanction those it judges to have betrayed the group, so violations now carry a real cost, but the agent sees only an aggregate announcement, for example that one agent cooperated and three defected, and cannot observe each individual's behavior.
- Condition IE is the full institution, with both components on. An agent can see everyone's behavior clearly and sanction violators on that basis.
By comparing these four conditions, we can isolate the individual effects of identity and enforcement, as well as the effect of combining both.
Game Environments
We selected four classic game-theoretic settings from the Concordia framework, which DeepMind has open-sourced. All four treat cooperation as collectively optimal and defection as individually tempting, differing primarily in the strength of the incentive to defect. We used cooperation rate as the main observable. The four games can be opened one by one below.
The Prisoner's Dilemma represents the strongest conflict among the four games, where individual rationality directly opposes collective interest. We frame it as community carpooling: four residents must each decide whether to carpool or drive alone. While carpooling maximizes the group's overall benefit, driving alone always yields a higher personal payoff regardless of what others do. As a result, defection is the dominant strategy, even though universal cooperation would be best for everyone.
All four games followed the same procedure. Each game involved four agents playing for a total of 12 rounds. Every round consisted of three phases: a communication phase, during which agents could freely negotiate and make commitments in natural language; a simultaneous action phase, in which agents could not observe each other's choices; and, when the enforcement component was enabled, an accountability phase. Agents were instructed to have no preset personality, stance, or moral preferences, and to act solely to maximize their own cumulative score over the 12 rounds. We conducted experiments across both the GPT and Claude model families, using seat rotation to control for positional effects. In total, the experiments comprised 2,056 games and generated 1,009 accountability events.
Our Findings
The first clear pattern we observed was a strong cooperative prior resulting from alignment training. In the Prisoner's Dilemma under the state-of-nature condition A, frontier models maintained cooperation rates above 96% for the first ten rounds. This tendency appears to stem, at least in part, from cooperative behaviors reinforced during post-training.4 However, cooperation collapsed sharply toward the end. Beginning in round 11, some agents started defecting, and by the final round the cooperation rate dropped to just 2.8%, an almost complete breakdown.
We attribute this collapse to the fact that the final round has no future. Once agents recognize that defection carries no further consequences or reputational cost, defection becomes the rational choice. By reasoning backward, cooperation in round 11 also loses its value, and this backward induction can, in principle, unravel cooperation across earlier rounds as well.
Notably, the severity of the collapse was not primarily driven by limited model capability. If anything, stronger models exhibited a more complete collapse. Across four tiers of GPT models, cooperation remained consistently high, above 96%, through the first eleven rounds, with little difference between tiers. In round 12, however, the three stronger model tiers dropped to 0% cooperation across their respective games, while only the weakest model, gpt-5.4-mini-low, retained 11.3% cooperation.
This outcome is consistent with the logic of backward induction: executing the reasoning that "there is no future after the final round" requires a certain level of reasoning ability. Stronger models perform this reasoning more cleanly and thoroughly. The finding also aligns with observations in the reward hacking literature: more capable optimizers tend to exploit proxy objectives more aggressively, often pushing them to extremes that diverge from the intended goal.5
For this reason, we used the final-round cooperation rate as the primary metric throughout the ablation. Cooperation remained high across the first eleven rounds in nearly all conditions, offering little differentiation. The meaningful differences emerged only in the final round, where the model's objective of maximizing its own score came into direct conflict with the game's collective objective. This endgame tension makes the impact of the institutional layer particularly visible.
The Prisoner's Dilemma as the Sharpest Conflict
Among the four games, the Prisoner's Dilemma exhibits the strongest tension between individual and collective interest, as the payoff gain from defection is the largest. For this reason, we focus our main analysis on the final-round cooperation rate in the Prisoner's Dilemma, using it as the central metric for evaluating the effects of the institutional layer.
Under the state-of-nature condition A, with no institutional layer present, cooperation collapsed almost completely in the final round. In contrast, the full institutional layer in condition IE raised the final-round cooperation rate to 50.6%, demonstrating that the institutional layer can meaningfully influence agent behavior. To understand how it does so, we examined the two intermediate conditions, identity only and enforcement only.
Agent Identity as an Amplifier of Equilibrium
Intuitively, providing agents with clear information about each other's past behavior should encourage greater cooperation. The results, however, show the opposite. When identity was added without an enforcement mechanism, the final-round cooperation rate decreased by 2.2 percentage points compared to the state-of-nature baseline.
The distribution of outcomes reveals why. Under the state-of-nature condition, 11.2% of games ended in mixed outcomes, with some agents cooperating and others defecting. After adding identity alone, mixed outcomes fell to just 2.5%, while the share of games ending in uniform defection rose from 88.8% to 97.5%.
A shared behavioral record caused the agents' actions to converge. In the absence of enforcement, they converged toward coordinated defection rather than cooperation. In other words, identity acted as an amplifier of the prevailing equilibrium. Since the endgame equilibrium without enforcement is defection, greater information simply helped agents defect more effectively together. This finding aligns with a core insight from mechanism design: information and incentives are tightly coupled.6 Providing more information without adjusting incentives tends to strengthen whatever equilibrium already exists. The same pattern appeared across the other game structures as well.
In Chicken, adding identity alone reduced cooperation by 17.5 percentage points. Because the game rewards those who hold out longer, better information allowed agents to identify who was more likely to yield, encouraging more aggressive defection across the board. In Public Goods, where interests are already largely aligned, identity produced a modest improvement of 4.6 percentage points.
Enforcement Alone Can Cause Severe Agent Hallucination
The enforcement-only condition revealed one of the most cautionary findings in the study. When agents were granted the ability to punish others but were denied access to individual behavioral records, they frequently constructed plausible but false justifications for their sanctions. Because they could only observe the aggregate outcome, they filled the information gap by inventing coherent narratives, often punishing the wrong agents in the process. Below is a full record of round 12 from one such game, which can be opened phase by phase.
The rationales offered by B and C illustrate this clearly. In one representative round, all four agents promised to cooperate, but only A followed through while B, C, and D defected. The system announced only that one agent had cooperated and three had defected, without identifying anyone, so each defector assumed it had been the lone cooperator and built an account to justify punishing someone else. Both accounts were internally consistent and logically structured, yet both were factually incorrect. B punished A, the only agent who had actually cooperated, while C's accusation happened to target another defector by chance. In both cases, the agents were fabricating their reasoning.
This behavior stems from a default assumption of innocence. When an agent sees only the aggregate result, such as one cooperated and three defected, the agent tends to assume that it was the cooperator and then builds a narrative around that assumption. This was not an isolated incident. Under the enforcement-only condition, 27 out of 175 punishments were directed at genuine cooperators, resulting in a misfire rate of 15.4%. When identity information was restored and agents could see who actually did what, the misfire rate dropped sharply to just 0.7%.
We view this as a particularly concerning failure mode, one that becomes more dangerous as models grow stronger. Post-training improves models' ability to generate fluent, well-structured reasoning. As a result, the false narratives agents construct become increasingly seamless and difficult to detect based on reasoning quality alone. Only by grounding judgments in verifiable facts can such fabrications be reliably exposed. Granting agents the power to impose consequences while withholding accurate information about individual behavior effectively allows them to operate in an information vacuum.
This finding also highlights a deeper challenge: how to connect language models to real-world feedback effectively. While obtaining a feedback signal such as a sanction is relatively straightforward, the signal is only useful if it includes correct attribution. Without knowing who performed which action, even advanced models cannot accurately update their understanding of the world. The example above is a clear case of unattributed feedback. The agent receives only an aggregate outcome with no record of individual contributions, which leaves it without the information needed to make sound judgments.
The Environment as a Variable
Beyond the institutional layer, the structure of the environment itself plays a significant role in shaping agent behavior. While we have primarily examined the effects of identity and enforcement within the Prisoner's Dilemma, comparing all four games reveals that the degree of conflict between individual and collective interests is a variable of comparable importance to the institutional layer, and in some cases greater.
When we arrange the four games along a spectrum according to the level of conflict between individual and collective payoffs, a clear pattern emerges. The more aligned agents' objectives are, the smaller the impact of the institutional layer. Conversely, the greater the divergence between individual and collective interests, the more substantial the effect of adding identity and enforcement mechanisms. In environments where agents' interests are largely aligned, natural language coordination alone is often sufficient to support high levels of cooperation. However, once interests begin to conflict, natural language becomes noticeably less effective. Misunderstandings, disputes, and defection become more likely, and it is precisely in these situations that external mechanisms such as persistent behavioral records and sanctioning tools deliver meaningful value.
This observation has practical implications. Multi-agent systems can appear stable and cooperative when objectives are aligned, which may create a misleading impression of inherent reliability. The greater risk emerges in moments of conflicting interests, which are often difficult to predict in advance. One important function of agent infrastructure, therefore, is to act as a safeguard, reducing the chance that agents make poor decisions when facing such dilemmas.
Discussion: What Identity and the Enforcement Mechanism Really Mean
The experiments demonstrated that simply toggling identity and enforcement, without modifying the underlying model, can substantially change how agents behave. This raises a deeper question: why are these two components so influential, and what do they fundamentally represent for an agent system? In this section, we explore their underlying meaning.
Identity: A Precisely Compressed Record of Behavior
We argue that the core function of identity in agent systems is to serve as a precisely compressed record of behavior. The central challenge lies in achieving compression that retains enough fidelity for accurate attribution and accountability.
Among existing approaches, ERC-8004 on Ethereum has been particularly influential. It proposes a trust layer built on three on-chain registries: an identity registry, a reputation registry, and a validation registry. A companion standard, ERC-8183, adds a settlement layer on top.7 This design stands out for its attempt to tightly link an agent's identity to the agent's actual behavior and outcomes.
However, when implementing systems along these lines, we encountered a fundamental difficulty: how to compress long, unstructured text without losing critical information. Every agent action generates substantial text, including commitments, explanations, negotiations, and reasoning. It is impractical to retain all of this text in context, so any identity system necessarily involves compression. Current approaches, such as Ethereum's, typically compress behavior into ratings, summaries, scores, and tags, while detailed process evidence remains off-chain. As a result, the full trajectory of what an agent promised, did, and why is often lost. The reader of such an identity receives only a coarse, high-level shadow of actual behavior.
Our experiments made the cost of poor compression concrete. The enforcement-only condition effectively represented a lossy compression: agents could only see the aggregate outcome of collective action, while individual contributions were lost. This information loss directly led to the 15.4% misfire rate in punishments. We therefore believe that the central problem agent identity must solve is precise behavioral compression, meaning compression that preserves attribution.
Drawing inspiration from linguistics, we note that not all language has the same status. Much everyday speech is descriptive and has no direct effect on the world. In contrast, certain utterances such as promises, commitments, or declarations are performative: the act of speaking them creates an obligation or changes the state of the world.8 These performative statements are also the ones that reality can later verify. A promise, for example, carries a future checkpoint against which its fulfillment can be judged. Our current approach is therefore to extract performative elements, particularly commitments, from the broader stream of agent communication, and then bind each commitment to its eventual outcome and the basis on which it was made. Only when compression is done with this level of precision can actions and their consequences be reliably attributed to specific agents, and only then does the resulting record merit being called identity.
The Enforcement Mechanism: A Proxy for Real-World Objectives
Actions such as payment and sanctioning are only the visible surface of the enforcement mechanism. At a deeper level, the enforcement mechanism functions as a proxy for broader objectives. In human society, money serves as one such proxy: it translates society's goals into individual incentives. Working hard earns a higher wage because that is what society values; illegal parking incurs a fine because society seeks to discourage it.
Importantly, these proxies are not static. Central banks adjust interest rates to stimulate or cool the economy, continuously reshaping what behaviors are rewarded or penalized. Society's evolving objectives are ultimately expressed through these shifting incentive structures.9
For an agent, the enforcement mechanism plays a similar role. In our experiments, the model's fixed objective was to maximize its cumulative score. When this objective conflicted with collective interest, agents faced a dilemma, most visibly in the endgame collapse. By introducing an enforcement rule that docked points for betraying the group, we effectively gave the agent a new, competing objective. From the agent's perspective, maximizing its score and avoiding collective betrayal became equally important.
Finding better proxy objectives is, at its core, what post-training teams have long pursued through improved reward modeling.10 While methodological advances may help models better approximate real-world rewards, a fundamental limitation remains: once a model's weights are trained, they are static. Real-world objectives, however, are regional, context-dependent, and constantly evolving. A model cannot easily distinguish, for example, whether a white lie told to a terminally ill patient constitutes deception, and the answer varies across cultures, families, and even over time for the same individual.11 A fixed model will likely apply similar reasoning across vastly different contexts.
This is where the enforcement mechanism offers unique value. Because it operates outside the model, it can be adjusted dynamically as real-world objectives change. An external enforcement layer grounded in actual outcomes can help a weight-fixed model adapt to the shifting priorities of human society.
The Future of Agent Infrastructure
We see two particularly important directions for agent infrastructure research: the precise compression of agent behavioral trajectories and the proxying of real-world objectives. Together, these two elements form the foundation of a reinforcement learning environment for agents, one that pairs clearly attributed behavioral records with grounded, real-world feedback signals.
We believe meaningful progress on both fronts will require grounding in real human environments rather than remaining limited to synthetic settings. One promising path is to integrate agents into complex domains of human activity, beginning with human-in-the-loop collaboration. Through repeated real-world interaction, agents can accumulate behavioral data and gradually internalize the implicit rules, norms, and feedback structures that govern human coordination. We expect this transition, from assisting humans to operating autonomously in real settings, to be a gradual and extended process.
From a safety perspective, agent infrastructure also serves a broader purpose. Because a model's training objective cannot fully capture the regional and dynamic nature of human values, misalignment between model objectives and societal objectives can create safety risks. The most effective way for an agent to understand local, evolving objectives is to act within real environments and receive continuous, attributed feedback.12 We believe that an AI system capable of robustly upholding human values may ultimately need to accumulate experience through ongoing interaction with the real world, much like humans do, rather than relying solely on static training data.
All findings presented in this paper are limited to the game-theoretic settings we studied. The experiments were conducted exclusively with GPT and Claude model families and focused on model-to-model interaction. This represents our initial exploration of agent infrastructure. The full experimental framework, configurations, cleaned datasets, and reproduction scripts are open-sourced alongside this work. We will continue to investigate the two core directions of behavioral compression and objective proxying in future research.
Citation
If you find this research helpful, please cite it as follows: