Adversarial Dynamics
How cooperative systems are deliberately exploited, and what the Bond prescribes when they are.
Full Practice · Bond · Strategic Lens
The Codex Lens
The Bond teaches how cooperation holds: calibrated trust, good faith as default, steelmanning, connection before correction. When both parties are accountable to those rules, the framework works. Most of the time, most people are accountable to them, and that is why the framework works most of the time.
This tool is for the other moments.
Most of what the Codex maps as failure is drift. Incentives misalign over time, entropy accumulates, and systems move toward Control or Decay because no one is holding them against the current. That kind of failure is slow, diffuse, and visible mostly in retrospect. The river erodes the bank.
Some failures arrive differently. An actor sits inside the cooperative framework, learns its language well enough to pass any calibration test, accumulates authority through visible alignment, and then spends the accumulated authority on something the framework was never designed to sanction. The cooperative vocabulary itself (good faith, charitable engagement, the strongest version of your argument) becomes the instrument of the exploitation, because those are the tools the framework runs on, and in the wrong hands they become the attack surface. This is a different shape of failure from drift. It is a river diverted upstream by someone who knew exactly which channel to cut.
Some failures are engineered, not structural. The existing Codex analysis of drift is correct. It is also incomplete. A framework for cooperation that does not account for the actors who come prepared to exploit it reads, to anyone who has watched exploitation happen in real time, as a framework for sincere people in a world that is not only sincere.
The Concept
The argument moves through seven mechanisms. Each one names a specific way cooperative systems get exploited, and what the Bond requires in response.
The Intentional Adversary
Forget the caricature. The adversary in this model is usually intelligent, often more fluent in the cooperative system's language than its average practitioner, and frequently convinced they serve some higher purpose. The pattern is strategic rather than malicious in the ordinary sense. The actor studies the cooperative system, learns its norms better than most of its members, and uses the fluency to accumulate authority specifically in order to spend it.
The EA/FTX case is the clearest public example I can point to. A community organized around rigorous reasoning and cooperative norms was hollowed out by an actor who had mastered those norms better than most of its members. The accumulated trust was the resource being built up, not a side effect of good behavior, and when the exploitation phase arrived, it got spent. The community discovered, after the fact, that its greatest vulnerability had never been the outsider rejecting the framework. It was the insider who had internalized the framework well enough to turn its own tools against it.
None of this requires paranoia. What it requires is recognizing the ordinary structural reality of any system that rewards people for looking cooperative: the incentive to pass for cooperative is available to anyone, and any framework aimed at continuity has to account for that. Assume every participant in the room is playing the same game, and you have already misunderstood the situation you are in.
How Trust Systems Are Exploited: Trust Mining
Trust mining is the practice of building up trust capital specifically in order to spend it. The accumulation phase looks identical to an ordinary career inside a cooperative system: the actor passes calibration tests, demonstrates alignment with the norms, earns authority through visible competence, and does it patiently, sometimes over years. Then, at the point where the accumulated authority is high enough to be worth spending, the exploitation phase begins: extraction, redirection, sometimes outright capture of the institution. From inside the system, this often does not look like a phase shift at all. It looks like someone using the authority they have legitimately earned.
The specific vulnerability this reveals: calibrated trust, which is the Bond's own prescription, can be gamed by an actor patient enough to pass the calibration. The rule says trust is extended based on demonstrated reciprocation. An actor who demonstrates reciprocation throughout the accumulation phase, and only reveals the asymmetry once the accumulated authority is worth spending, has not technically violated the rule. They have found the edge of it, and much of the worst exploitation lives on exactly that edge.
Under this lens, three existing Bond tools pick up sharper meaning. Graduated Reciprocity is the primary structural defense: if exposure grows only in step with demonstrated reliability, the authority an actor has accumulated at the moment of the turn is bounded rather than total. Trust Thermocline describes what tends to happen once the turn is visible. The collapse from high trust to no trust is not gradual. It arrives all at once, often overshooting what the evidence strictly warrants, because the discovery re-weights every prior interaction in the light of what is now known. And Defection Cascades is what large-scale trust mining triggers in adjacent systems: other actors watch the extraction happen, update their own strategies accordingly, and the cooperative equilibrium across the broader environment drifts toward a worse one.
How Cooperative Language Is Weaponized: The Cooperative Vulnerability
The framework's own vocabulary becomes the attack surface.
Steelmanning turns into a legitimation tool. The actor demands that their position be engaged on its strongest possible version, while actually constructing the position specifically to shift the terms of the conversation. The cooperative norm of charitable engagement ends up smuggling in claims that would not survive ordinary scrutiny on their own.
Good faith becomes a shield demanded asymmetrically. The bad-faith actor insists on good-faith engagement from their target while operating strategically themselves, and the asymmetry (the target held to the cooperative standard, the actor exempt from it) is the whole mechanism.
Connection before correction turns into a guarantee that manipulative positions will be received sympathetically before they can be challenged. The Bond teaches that you establish shared ground before naming difference. An actor who understands this sequence can design their interventions to extract the sympathetic reception and depart before the correction arrives.
Honest inquiry becomes an extraction tool. The actor asks questions in the posture of learning, while actually gathering information that will be used strategically later. The cooperative framework trains you to answer honestly when asked, and the actor counts on that training.
Cooperative language is genuine when both parties are accountable to it. It is weaponized when one party uses it to constrain the other while exempting themselves. The test is whether the vocabulary binds both sides equally. When it does, the frame is intact. When it does not, the frame is already the weapon, and you are already inside the exploit.
How Institutions Are Captured From Within
Institutional capture is trust mining run at scale, through the legitimate channels of an institution, over time. An actor rises through ordinary promotion, demonstrates competence, accumulates authority, and past a certain threshold begins redirecting the institution's resources, norms, or mission toward purposes it was never built to serve. The capture is invisible step by step, because step by step nothing unusual is actually happening. It becomes visible only when you add up the cumulative direction of travel across years.
The pattern has been documented wherever institutions do real work. Regulatory agencies end up staffed by the industries they were built to regulate. Political parties get hollowed out from inside by factions whose actual agendas are orthogonal to the party's stated mission. Religious organizations acquire leaders who concentrate authority through visible demonstrations of spiritual competence, and then bend the institution toward their own aims. Academic fields get captured at the tenure committees, the editorial boards, and the funding panels; the captured positions are subsequently used to enforce a conformity the field never consciously chose.
Three more Bond tools take on particular weight under this lens. Cult Dynamics is the extreme form of capture, where the institution becomes a wholly owned instrument of a single actor or clique. Preference Falsification explains why it can take so long for anyone to say what is happening: members see the drift, but the social cost of naming it exceeds the perceived benefit, so the official perception of the institution lags its actual trajectory, sometimes by decades. Loyal Opposition is the structural defense against both: institutionalized dissent makes capture visible earlier by creating a legitimate channel through which concerns can be raised without career cost.
When Good-Faith Norms Fail Under Asymmetric Bad Faith
Any system that defaults to good faith has a structural vulnerability to actors who exploit that default. The claim is game-theoretic, not moral. In a repeated game, good faith is an equilibrium strategy so long as the other player is also playing for the long term. Against an actor playing a different game entirely, one whose aim is extraction or capture or disruption, good faith stops functioning as a strategy and starts functioning as the surface they intend to cut through.
Some positions do not deserve the strongest-version treatment, because the position's purpose is strategic rather than truth-seeking. Steelmanning them legitimizes the framing and strengthens the position without producing any of the truth-seeking that steelmanning was built to serve. Charitable engagement in the Bond is conditional on reciprocity. It is owed to the counterpart who will accept the same obligation in return, and withheld from the actor who uses your acceptance of the obligation as leverage against you.
The answer is not to abandon good faith. That would be surrender: you would become what the adversary already is, and you would lose all the cooperative advantages that made the framework worth defending in the first place. The answer is to make good faith conditional on reciprocation, and to build the diagnostic capacity to see when reciprocation is being performed rather than practiced. This is harder than either unconditional trust or default suspicion. It is also the posture that keeps the framework intact under pressure, rather than losing it in either direction.
The Exclusion Problem
When does a cooperative framework have to exclude a participant in order to survive? This is the hardest operational question under the adversarial-dynamics lens, and it is the one the Codex does not get to skip.
The Compact says identity is through practice: you belong by doing the work. What happens, then, when someone claims membership while systematically violating the practice? The answer has to be exclusion, under conditions the framework specifies in advance, or the exclusion will be indistinguishable from the Control move the framework exists to resist.
Four conditions together justify exclusion:
A sustained pattern of bad faith, not a single incident. The Practice already handles breach and repair at the event level. Exclusion is reserved for the pattern that survives the full repair sequence.
Evidence of strategic exploitation, distinct from honest disagreement that happens to be wrong. People are allowed to be wrong, and wrong for a long time, within the cooperative framework. Strategic exploitation is a different category: behavior whose aim is not truth-seeking or cooperative outcome but extraction, capture, or disruption.
Failure of repair protocols. The actor has been confronted, the pattern has been named, the cooperative system has extended the full sequence of its repair practices, and the pattern continues anyway. Exclusion earns its place only on the far side of that process.
The exclusion serves the Range, not the excluder's comfort. If the exclusion primarily preserves the cooperative system's ability to function while the actor is present, it is Range-holding. If it primarily removes a source of discomfort, disagreement, or unwelcome critique, it is Control presenting itself as protection.
The risk of false positives is severe, and it is worth naming directly. Labeling dissent as sabotage is one of the oldest Control moves in institutional history. Exclusion wielded to enforce conformity is the Codex practicing the failure mode it claims to resist. The diagnostic has to be structural rather than emotional. The question is not does this person make me uncomfortable, but does the evidence actually support the conclusion that they are exploiting the cooperative system, and has repair been attempted in good faith and failed. If either half of that answer is no, the exclusion is not justified, regardless of how much easier things might be without them.
The Codex values disagreement. What it has to distinguish from disagreement, carefully, is exploitation. The line between the two is the part of this work I find most uncomfortable, because it is the part most likely to be gotten wrong in both directions, and it is the reason this section of the tool has to exist at all.
Sabotage Diagnostics
From outside, genuine dissent and strategic sabotage can be indistinguishable. Both oppose prevailing positions. Both name problems. Both are uncomfortable for the institution on the receiving end. That is precisely why capture is so difficult to resist in real time: the capture agent and the principled reformer sound alike on any particular day. The difference shows up only as a pattern, and only over time.
Five behavioral signatures, read as a pattern over time, are how you tell them apart.
These are probabilistic indicators. None of them is proof on its own. A person can engage the weak version of an opponent's position in a given moment without being a saboteur. A person can attack a process they perceive as rigged without being engaged in sabotage. What you are actually reading is a pattern, across signatures, over time, and the threshold for action (particularly exclusion) is set deliberately high.
False positives are as dangerous as false negatives. Treating a principled dissenter as a saboteur is how institutions calcify. Treating a saboteur as a dissenter is how they are captured. The tool does not eliminate either risk. What it can do is make both risks legible, and force the decision onto structural evidence instead of felt discomfort.
The Practice
Three practices convert the analysis above into something you can actually apply in the middle of a live situation.
The Reciprocation Test. Before extending further trust or engagement, ask: is the other party accountable to the same norms they are invoking? If they demand good faith but do not practice it, the asymmetry is the diagnostic. If they insist you steelman their positions while engaging yours only in caricature, the asymmetry is the diagnostic. If they require charitable interpretation of their motives while interpreting yours uncharitably, the asymmetry is the diagnostic. The test is not whether they use the cooperative vocabulary. The test is whether the vocabulary binds them as it binds you. When it does not, the vocabulary is already being used as a weapon and the cooperative frame is already compromised.
The Pattern Assessment. Single interactions are ambiguous. Patterns over time are not. Track: does this actor's behavior consistently produce fragmentation where cooperation was possible? Do their arguments shift with the audience in ways that suggest strategic positioning rather than honest development? Do they engage the strongest versions of opposing positions, or reliably engage the weakest? Do they respond to evidence that contradicts them, or only to changes in strategic position? One data point is not a pattern. Five data points over six months is. The Bond's commitment is to cooperate in good faith. That commitment does not require you to ignore what patterns over time reveal about who you are actually cooperating with.
The Structural Check. When considering any serious response, and particularly when considering exclusion, require structural evidence: sustained pattern across multiple independent observations, failed repair protocols, strategic exploitation confirmed rather than assumed. Never act on a single incident. Never act on a single observer's assessment. Never act when the evidence supports only "this person is difficult" or "this person is wrong" rather than "this person is exploiting the cooperative system." The structural check is deliberately slow because the cost of false positives is as severe as the cost of false negatives.
These three practices share an architecture. Each converts an intuition (something is off here) into a structural claim (this is the asymmetry, this is the pattern, this is the evidence). The conversion is where the discipline lives. Intuitions on their own go wrong in both directions: too generous toward the exploiter, too harsh toward the dissenter. What the practices do is slow judgment down and require it to surface its reasoning, which is what allows you to examine the reasoning before you act on it.
In the Wild
A rationalist community built an entire research institution on cooperative norms: rigorous argument, charitable engagement, meritocratic allocation of resources. A major donor, fluent in both the community's language and its values, accumulated trust through visible alignment over several years. When the exploitation phase arrived, it arrived large: billions of dollars of customer deposits had been misappropriated to prop up a trading operation. The community's post-mortem focused less on whether the failure had happened, and more on how an actor so deeply embedded had been able to operate undetected for years without anyone naming what was happening. The answer pulled together trust mining, institutional capture, and the weaponization of the community's own epistemic norms. What the community is still absorbing is that rigor itself does not protect against an adversary who has made rigorous study of how the community thinks.
A regulatory agency was created to oversee a powerful industry. Over the next two decades, senior staff rotated through industry positions and back into the agency. Each individual career move was defensible, and several were admirable on their own terms. The aggregate was capture. The agency stopped producing decisions that constrained the industry in meaningful ways. The formal structure persisted for years. The substantive function had quietly been evacuated from inside it. By the time the capture was visible in the outcomes, the people best placed to name what had happened had already internalized the institutional norms that made naming it difficult.
A political movement adopted the language of reform. Its members attended every meeting, followed every procedure, won every vote. Critics who objected were told to engage through the established processes, the way the reformers had. When objections were raised through those processes, the movement used its majority to block them. When procedural changes were proposed to restore balance, the movement invoked the sanctity of process. When the process produced outcomes the movement did not want, the movement attacked the process. Each move, on its own, was defensible. The pattern added up to sabotage. What was consistent across the pattern was not any substantive position, but the selection rule underneath it: the appeals shifted with the strategic position, and the commitment to the process held only for as long as the process was producing the outcome the movement wanted.
A small team operating with high mutual trust noticed something odd. One member's contributions were increasingly received as correct by default, while another member's contributions were increasingly being examined carefully. When the team sat down with it, the asymmetry had nothing to do with the quality of the work. It was about accumulated authority. The first member had been demonstrating visible alignment for months. The second had been disagreeing, legitimately but uncomfortably, with roughly the same frequency. The team had quietly started treating visible alignment as evidence of competence. They caught the drift before it could institutionalize, and they corrected it by making the basis for their assessments explicit. Not every case of adversarial dynamics ends in catastrophe. The ones that do not, in the cases I have watched up close, are almost always the ones where the diagnostic tools were available and the team had the discipline to use them while there was still time.
What the four situations have in common is that none of them required the adversary to be cartoonish. What they required was a cooperative system either without this toolkit, or with the toolkit but without the discipline to pick it up when the signals started arriving.
The Bond without adversarial dynamics is incomplete, for a reason that took me a long time to state precisely: it teaches how to cooperate well, and it has to also teach what cooperation looks like when the person across from you has studied the cooperative framework in order to take from it.
Suspicion as default would destroy the cooperative framework as thoroughly as any adversary could, because it would turn you into the thing you were resisting and cost you all the advantages cooperation gives you in the first place. Naive openness fails in the opposite direction: over time, it invites exactly the exploitation it refuses to anticipate. What the Bond asks for under adversarial pressure is something harder than either: calibrated trust that carries a specific new capacity inside it, the ability to recognize, in real time, when the cooperative framework itself is being turned against you. That capacity is not native. It has to be built.
I want to close with the thing the whole tool depends on. The Codex values disagreement. It requires dissent. The standing critique of any framework is part of how the framework stays honest, and nothing in adversarial-dynamics analysis reduces that commitment. What it changes is only the assumption that everyone invoking the cooperative vocabulary is also accountable to it. Whether they are accountable, and not whether they speak the language, is what you are actually watching for.
Where This Comes From
The Codex did not invent adversarial dynamics. The mechanisms named on this page were mapped by several independent research traditions over the course of the twentieth century, and the Codex's contribution is to assemble them into a coherent tool for the Bond. What follows is the intellectual history: where these ideas originated, who developed them, and where to go if you want to study them beyond what this page covers.
Robert Axelrod's tournaments, published in The Evolution of Cooperation (1984), established the game-theoretic foundation. Axelrod showed that cooperative strategies like Tit-for-Tat outperform both unconditional cooperation and unconditional defection in repeated interactions, but he also showed the conditions under which cooperation breaks down: when the shadow of the future shortens, when reputation becomes unreliable, when a player exits the game. Trust mining is a specific defection strategy that Axelrod's framework predicts and explains: a long accumulation phase that exploits the cooperative equilibrium until the payoff from a single large defection exceeds the value of continued cooperation. Axelrod remains the starting point for anyone who wants to understand the structural logic of when good faith holds and when it fails.
Mancur Olson's The Logic of Collective Action (1965) and Elinor Ostrom's Governing the Commons (1990) developed the institutional economics of cooperation. Olson's work on free-riding in large groups named the vulnerability: when the benefits of cooperation are diffuse and the costs of monitoring are high, actors who defect while appearing to cooperate can extract substantial value before detection. Ostrom's work on commons governance mapped the institutional defenses: graduated sanctions, monitoring designed into the institutional structure, clearly defined membership. Ostrom's design principles are the ancestors of the practices described above, translated from commons management into cooperative systems more generally.
The political science literature on entryism and regulatory capture, developed from the mid-twentieth century onward, documented institutional capture empirically. George Stigler's work on regulatory capture (The Theory of Economic Regulation, 1971) demonstrated that regulatory bodies systematically come to serve the interests of the industries they regulate, not through conspiracy but through the slower mechanisms of career incentives, informational asymmetry, and the social dynamics of repeated interaction. The literature on political entryism (factions that join a party or institution in order to capture it) goes back further, to analyses of twentieth-century ideological movements that used democratic institutions to dismantle democratic norms. For readers who want the empirical foundation of how capture works in practice, the regulatory capture literature is the place to go.
Karl Popper's The Open Society and Its Enemies (1945) formalized the paradox of tolerance: a society committed to tolerance has to be intolerant of intolerance, or the intolerant will use tolerance to destroy the society that extends it. John Rawls' Political Liberalism (1993) developed this further with the concept of the limits of reasonable pluralism: a liberal society owes reasonable disagreement its full engagement, but it does not owe unreasonable positions the same standing. The Exclusion Problem section is a direct descendant of this line of thought, adapted from the level of political philosophy to the level of cooperative frameworks and institutions. Popper and Rawls are the places to start for the philosophical grounding.
Social psychology contributed the empirical evidence that systems produce exploitation even from actors who would not individually choose it. Stanley Milgram's obedience experiments (1961 onward) and Philip Zimbardo's Stanford prison experiment (1971) are the most famous. Both have been criticized on methodological grounds, and the strong forms of their conclusions have been significantly revised. The durable finding, consistent across decades of follow-up research, is subtler and more disturbing: structures shape behavior in ways that produce exploitative outcomes from people who did not set out to exploit anyone. Why this is important for adversarial dynamics specifically: it names the vulnerability that exists even without intentional adversaries. The cooperative framework has to withstand both the intentional adversary and the structural drift that produces exploitative behavior without intent.
The analysis has two limits that deserve naming directly. First, adversarial-dynamics reasoning can become paranoid in the wrong hands. A framework that treats every disagreement as potential sabotage has already failed the test it claims to apply. The tools on this page are built to raise the threshold for adversarial interpretation, not lower it. If you find yourself reaching for this page to explain away dissent you happen to find uncomfortable, you are using it wrong, and the most honest response is to notice that and put the page down. Second, no diagnostic eliminates uncertainty. You can apply every signature, follow every practice, and still be wrong about a specific case. The tools improve the probability of correct assessment. They do not produce certainty, and anyone who treats them as if they do has misread what they are. The position this page takes is that imperfect tools applied with discipline are better than no tools applied with confidence, and the work of holding that distinction honestly is part of what doing cooperation seriously actually costs.