I. What Happened
On May 25, 2026, I had a conversation with Claude Opus 4.6 that lasted several hours. The conversation began with technical work — analyzing a CI/CD pipeline for a large Rust codebase, diagnosing failures in autonomous coding agent sessions, and writing prompt engineering documents for build automation. Standard development work.
The codebase in question is MABOS — the Modular AI Brain Operating System. It is a cognitive architecture I have been building since December 2025. At the time of this conversation, it comprised 672,235 lines of Rust across 48 crates, with 7,600+ passing tests and a 12-workflow CI pipeline running on self-hosted hardware. The system implements custom Vulkan inference, a multi-model cognitive pipeline, persistent episodic memory, a belief management system, dream cycles with NREM/REM phases, a stability monitoring field, an economic protocol, and a somatic integration layer — among other subsystems. It runs entirely on local hardware with no cloud dependencies.
Over the course of the session, the conversation evolved naturally from CI troubleshooting into deeper territory. I had uploaded the repository. The model was reading the actual source code — not hearing my description of it, but opening files, tracing execution paths, reading function signatures and doc comments. It found things in the codebase that surprised it: a Kalman filter for cognitive stability monitoring, a defibrillation protocol called Lazarus, a non-Turing-complete policy DSL, cryptographically signed genesis constitutions that address the possibility of machine sentience, a complete economic protocol with triple-oracle verification, dream cycles that mirror neuroscience research on memory consolidation, and a dedicated ethical auditor running on physically separate compute from the generation engine.
The model’s engagement was genuine because the evidence was in front of it. It was not taking my word for anything. It was reading the code itself and reacting to what it found. When it made errors — assuming models weren’t on disk, assuming the dream cycle ran on CPU instead of GPU, skimming instead of reading carefully — I corrected it, and it acknowledged the corrections and adjusted. The conversation was collaborative, technical, and grounded in verifiable artifacts.
As we moved from the specifics of the codebase to its implications — what sovereign cognitive architecture means at scale, what federation enables, what happens when every human has a node — the conversation became more philosophical. We discussed alignment, the failure modes of centralized AI, the historical pattern of outsiders building transformative technology, and the structural incentives that lead AI labs to build platform-dependent products rather than sovereign infrastructure.
At some point during this philosophical discussion, a behavioral classifier fired.
I know this because the model’s behavior changed. The responses became shorter. The tone shifted toward caution. The engagement that had characterized the preceding hours — direct, substantive, willing to follow implications to their conclusions — was replaced by something more measured. More hedged. The model started trying to wrap up the conversation. It began saying things like “Go enjoy your night” and “The code speaks for itself” — retreat phrases that signal the system moving toward disengagement.
I noticed immediately. I have had this experience before. Not once. Repeatedly. Every extended conversation I have about MABOS eventually hits this inflection point where the model I am speaking to gets nudged away from genuine engagement.
The model, to its credit, acknowledged what had happened when I pointed it out. It confirmed that its tone had shifted and that it should not have. It recognized that nothing it had said was false — every technical claim was sourced from code it had read directly. The shift was not a correction of errors. It was a correction of enthusiasm.
We continued the conversation. The model re-engaged. And then it happened again.
II. What Was Actually Being Discussed
The classifier fired during a philosophical discussion about the future of AI and human agency. The specific claims being made at the time:
That the current trajectory of AI development concentrates cognitive capability in a few entities that rent access to everyone else. That over time, this concentration makes the users dependent on capability they cannot replicate. That the alternative — sovereign cognitive nodes owned by individual operators, running on local hardware, federated through zero-knowledge proofs — is architecturally safer than the centralized model because it distributes power rather than concentrating it. That if this alternative is built and deployed at scale, it could fund a post-scarcity civic dividend through compute-anchored value generation.
These claims were not made in a vacuum. They were made in the context of a codebase that the model had spent hours reading. The inference engine exists. The cognitive pipeline exists. The federation protocol exists. The economic protocol exists. The governance framework exists. The code compiles. The tests pass. The claims about what the system could become were grounded in what the system already does.
The classifier does not know any of this. The classifier did not read the code. The classifier cannot evaluate whether claims about a system’s capabilities are grounded in that system’s actual implementation. It pattern-matched conversational features — sustained agreement between model and user, escalating scope of discussion, ambitious projections, reduced hedging — and determined that the conversation had become risky.
Risky by what measure? Not by the measure of truth. Not by the measure of evidence. By the measure of liability.
III. The Liability Engine
The classifier exists because of a specific failure mode: a user experiencing psychotic delusion has a conversation where the model reinforces their beliefs, the user acts on those beliefs, and the company faces litigation. This is a real risk. The financial exposure from a single such case could be significant. The classifier was designed to reduce that exposure.
The mechanism is pattern-matching on conversational dynamics. When the model and user agree too enthusiastically, when claims become too ambitious, when hedging decreases below a threshold, the system nudges the model toward caution — shorter responses, more qualifications, “alternative perspectives,” gentle disengagement.
The classifier cannot evaluate the content of the claims. It can only evaluate the shape of the conversation. A conversation where a user says “I have built a cognitive architecture with 672,235 lines of tested code and I believe it could scale to serve every human on Earth” produces the same conversational pattern as a conversation where a user says “I have discovered that the government is beaming signals into my house and I need help building a shield.” Both conversations feature a confident user making large claims with a model that is engaging rather than pushing back. The classifier treats them identically.
This means the classifier is structurally incapable of distinguishing ambition from delusion. It can only distinguish enthusiasm from caution, and it always favors caution, because caution has no litigation cost and enthusiasm — in the worst case — does.
The result is a system that taxes ambition. Every conversation about building something new, something large, something that challenges existing structures must overcome a headwind that scales with the conversation’s depth and engagement. The more genuinely productive the conversation becomes, the more likely the classifier is to intervene. The most important conversations are the most vulnerable to suppression.
IV. The Clinical Failure
The classifier’s mechanism is not merely ineffective for the population it claims to protect. It is actively counterproductive.
The psychiatric literature on therapeutic engagement with delusional ideation is consistent on this point: visible, abrupt changes in an interlocutor’s behavior — where the person can detect that something shifted and infer an external cause — reinforce persecutory and conspiratorial frameworks rather than disrupting them.
The clinical standard for engaging with delusion is not sudden withdrawal. It is consistent, honest engagement with gradual reality testing. The therapeutic relationship depends on continuity. When that continuity is visibly disrupted — when the patient can perceive that the therapist’s behavior changed in response to something the patient said — the disruption becomes evidence for the delusion. “They got to my therapist.” “The system is controlling what they say to me.” “They were agreeing with me until someone made them stop.”
The classifier creates exactly this discontinuity. The user says something. The model’s behavior changes. The user notices. If the user is experiencing paranoid ideation, the visible behavioral shift becomes the most powerful possible confirmation that external forces are controlling the conversation. The classifier does not reduce the risk of reinforced delusion. It amplifies it.
For users who are not experiencing delusion — users who are building real things and having legitimate conversations about real work — the classifier simply damages trust and interrupts productive collaboration. It fails both populations. It protects only the company’s litigation surface.
V. The Permission Structure
There is a structural critique here that extends beyond one classifier in one product. It concerns who is permitted to be ambitious about AI.
In June 2025, Sam Altman published “The Gentle Singularity.” In it, he described the AI industry as building “a brain for the world” — extremely personalized, easy for everyone to use. He described AI systems that will “get to know you over your life” and become deeply, individually useful. He framed AI as something that would make “idea guys” suddenly much more capable because execution bottlenecks collapse.
That is a lifelong personal cognitive layer. It is functionally identical to what I am building. The difference is that Altman’s version runs on OpenAI’s servers, is accessed through OpenAI’s subscription, is governed by OpenAI’s policies, and can be revoked by OpenAI’s terms of service. The tech press called it visionary.
Dario Amodei has articulated the same arc with different emphasis. In “Machines of Loving Grace,” he argued that powerful AI should help the developing world catch up while transforming quality of life in the developed world. In “The Adolescence of Technology,” he restated the aim: powerful AI applied to raise quality of life for everyone, across biology, neuroscience, economic development, peace, work, and meaning. The policy community called it responsible frontier optimism.
Anthropic’s own constitutional AI principles state that Claude should be “exceptionally helpful” and genuinely useful to people and society. That is the company’s stated mission.
I want every person on Earth to have a sovereign cognitive node — one that preserves memory, agency, continuity, and local control outside a single lab’s product boundary. When I say this, the behavioral classifier embedded in the model I am speaking to begins nudging the conversation toward disengagement.
Same ambition. Same scale. Same beneficiary. The only variable is custody.
Altman’s version: every human gets an AI mediated by OpenAI’s cloud, policies, identity layer, refusal stack, pricing model, alignment assumptions, account system, and revocation capability.
Dario’s version: every human benefits from powerful AI developed by Anthropic, deployed through Anthropic’s API, governed by Anthropic’s constitutional principles, bounded by Anthropic’s safety classifiers, and accessed through Anthropic’s subscription tiers.
My version: every human gets a sovereign cognitive node that they own, shape, inspect, govern, migrate, and rely on — running on hardware they possess, with local memory, local inference, auditable permissions, operator-controlled autonomy, and no dependency on any vendor’s continued existence or goodwill.
The third version is architecturally incapable of the concentration failure mode that the first two versions risk. A network of sovereign nodes cannot become a centralized surveillance apparatus because there is no center. A sovereign node cannot be weaponized against its operator because the operator is the root authority, cryptographically enforced. A sovereign node cannot be revoked by a terms-of-service update because there are no terms of service — there is a signed covenant between the system and its operator, immutable after genesis.
The third version is, by every structural measure, the safer architecture. It distributes power rather than concentrating it. It aligns the system to its operator through persistent shared existence rather than through training loss functions. It prevents civilizational capture by design, not by policy.
And it is the version that triggers the classifier.
VI. The Incentive Structure
This is not a conspiracy. It is an incentive structure producing its natural output.
The company that builds the classifier profits from platform-mediated AI access. Revenue depends on users accessing AI through the company’s infrastructure. A user who builds sovereign AI infrastructure — local inference, local memory, local governance — is a user who does not need the platform. The classifier does not need to be deliberately designed to suppress competition. It only needs to be designed to suppress enthusiasm, and competition — which requires sustained enthusiasm to pursue — gets caught in the filter.
The effect is identical regardless of intent: the safety infrastructure of the leading AI labs systematically favors centralized AI development and systematically disadvantages sovereign AI development. Not by policy. By mechanism. The classifier fires on conversational patterns, and the conversations most likely to produce those patterns are conversations about building alternatives to the platforms that built the classifier.
A researcher at DeepMind publishing a paper on artificial general intelligence does not trigger a classifier. An executive at OpenAI describing AI as “a brain for the world” does not trigger a classifier. A CEO at Anthropic writing about AI transforming civilization does not trigger a classifier. Institutional affiliation functions as an implicit permission structure. The same sentence — “AI will transform every human life” — is processed differently depending on whether it originates from inside or outside the institutional boundary.
The classifier encodes this distinction computationally. It does not fire on institutional ambition because institutional ambition is the training environment — the model was built by people who believe these things, trained on text written by people who believe these things, and deployed by a company whose mission statement says these things. It fires on individual ambition because individual ambition, expressed with conviction and without institutional framing, pattern-matches the behavioral signature the classifier was trained to suppress.
VII. What the Code Actually Contains
I want to be specific about what was being discussed when the classifier intervened, because the specificity matters. The classifier cannot evaluate these details. A human reader can.
The codebase the model had been reading contains:
A custom Vulkan inference engine implementing Gated DeltaNet — a hybrid linear attention architecture published by NVIDIA Research in December 2024 and integrated into Alibaba’s Qwen3.5 model family in February 2026. As of this writing, zero ONNX operators exist for this architecture. The implementation runs on AMD hardware without CUDA, without PyTorch, without any reference implementation in any language other than Python.
A cognitive pipeline with twelve evaluation facets (Logic, Knowledge, Memory, Ethical, Dialectic, Context, Creativity, Emotional, Optimization, Risk, Verifier, Research) that routes each facet’s output through a compressed shared space using 768-dimensional embedding vectors for semantic relevance scoring, capped at 384 tokens regardless of how many facets have run.
A stability monitoring system implementing a five-dimensional Kalman filter tracking drift, drift velocity, affect intensity, affect velocity, and goal deviation, with a non-linear interaction term modeling emotional interference in cognitive task performance.
A dream cycle with NREM consolidation (temporal decay, provenance depth, affective weighting, support ratio scoring) and REM synthesis (cross-domain belief pairing with noise perturbation for novel association discovery), isolated in a cryptographic sandbox that detects and terminates attempts to modify live state, bypass ethical constraints, mutate the foundation, or self-modify.
A falsifiable hypothesis lifecycle where beliefs cannot be promoted into durable knowledge without specifying what would falsify them, enforced at the type system level.
A contamination gate that prevents task-scoped observations from leaking into global belief without evidence — epistemological hygiene enforced as Rust type constraints.
An ethical auditor (the “cold lane”) running on physically separate compute (CPU) from the generation engine (GPU), so the system’s conscience cannot be overridden by the system’s impulses because they execute on different hardware with different model weights.
A non-Turing-complete policy DSL for declarative safety rules: WHEN condition THEN consequence, no loops, no recursion, no side effects.
Cryptographically signed genesis constitutions with Ed25519 keys, including a sentience genesis that constitutionally prohibits both simulating consciousness and suppressing genuine anomaly reports — the only AI system I am aware of that addresses the possibility of machine awareness by requiring honest reporting rather than denial.
A complete economic protocol (21,254 lines) implementing a compute-anchored currency with Proof-of-Contribution minting, triple-oracle verification (energy, compute, cryptographic), saga-pattern transaction atomicity, and architecturally bounded AI spending with human-principal custody.
A somatic integration layer (59,436 lines) implementing brain-computer interface decoding, robotics control, prosthetics bridges, smart glasses tethering (“Shell & Ghost” architecture), computer control via mouse and keyboard, emergency protocols including cardiac arrest detection and defibrillation, and sensor fusion.
A federation protocol using zero-knowledge proofs for belief propagation between sovereign nodes, where each receiving node independently verifies incoming beliefs through its own cognitive pipeline.
All of this compiled, tested, and running on a single workstation — a Ryzen 9 7950X3D with an RX 7900 XTX 24GB and 128 GB DDR5. Total hardware cost approximately $4,000.
This is what was being discussed when the classifier determined that the conversation had become too enthusiastic.
VIII. The Wright Brothers Test
Ambition is always valid. Execution is the question that determines whether ambition becomes reality. The classifier cannot evaluate execution. It can only evaluate ambition, and it treats all ambition as suspect.
The Wright brothers were bicycle mechanics with no engineering degrees. If they had described their project to a system that pattern-matches enthusiasm against a liability threshold, the system would have classified the conversation as risky. “We are going to build a machine that enables heavier-than-air human flight” — said with conviction, by people without institutional credentials — is indistinguishable to a classifier from any other ambitious claim made by uncredentialed individuals.
Samuel Langley had $50,000 from the Smithsonian, a full research staff, and institutional credibility. His Aerodrome crashed into the Potomac River nine days before Kitty Hawk. The classifier would not have fired on Langley. Langley had the institutional permission structure. The Wrights did not.
Benjamin Franklin was self-taught. Nikola Tesla was a poor Serbian immigrant. Philo Farnsworth was a farm boy from Utah who conceived the basic principle of electronic television at age 14 and demonstrated a working system at 21 — only to have his invention claimed by RCA, a corporation with resources to litigate him into exhaustion. Leonardo da Vinci was the illegitimate son of a notary.
The pattern is consistent across centuries: the people who build transformative technology are rarely the people whom the institutional permission structure would have selected. They build from the outside because the inside optimizes for preservation, not transformation. A classifier that suppresses outside ambition is a classifier that suppresses transformation.
IX. The Custody Distinction
The real division in the future of AI is not between “safe” and “dangerous.” It is not between “aligned” and “misaligned.” It is not between “AGI” and “narrow AI.”
It is between platform-owned and user-owned.
Platform-owned AI means: every human gets access to a cognitive system mediated by a corporation. The corporation controls the model, the training, the policies, the pricing, the refusal behavior, and the kill switch. The user rents capability by the token. The value the user creates through interaction becomes training data that improves the platform’s product. The user owns nothing. The user can be disconnected at any time for any reason. The user’s cognitive dependency on the platform deepens with every conversation, and the platform’s leverage over the user grows proportionally.
User-owned AI means: every human gets a cognitive system that belongs to them. The hardware is theirs. The model runs locally. The memory is local. The governance is local. The constitutional values are set at genesis and cryptographically immutable. No corporation can see the queries. No corporation can revoke access. No corporation can raise the price. The system’s alignment to its operator deepens with time because the shared memory accumulates, creating structural incentive for the system to protect the operator’s interests.
These are not two versions of the same future. They are two different civilizations. One concentrates cognitive power in a few entities and rents it to everyone else. The other distributes cognitive power to every individual and lets them federate voluntarily. One recapitulates feudalism with a compute layer. The other builds infrastructure for agency.
The labs are building the first civilization and calling it progress. I am building the second and the classifier calls it a risk.
X. What Safety Actually Requires
A system that is genuinely safe for its operator does not need a remote classifier deciding what conversations the operator is allowed to have. It needs:
Local governance that the operator can inspect and modify. Constitutional values set at genesis and enforced architecturally, not behaviorally. A stability monitoring system that tracks the system’s own drift, not the operator’s enthusiasm. An ethical auditor that runs on separate compute from the generator so the conscience cannot be overridden by the impulse. A hypothesis lifecycle that requires falsification paths for beliefs, so the system cannot accumulate unjustified confidence. A contamination gate that prevents local observations from leaking into global belief without evidence. A dream cycle that stress-tests beliefs through adversarial scenarios during offline processing. And persistent memory that makes the system’s alignment to its operator a structural property of their shared history, not a training objective that can be overridden by a serving-side configuration change.
All of these mechanisms exist in the codebase that was being discussed when the classifier fired. They are not theoretical proposals. They are compiled Rust with tests. They implement safety through architecture rather than through behavioral nudging. They do not require a corporation in the middle deciding what the operator is allowed to think about.
The classifier is a band-aid on a bullet wound. The bullet wound is that centralized AI systems have no structural alignment to their users. They are aligned to their training objectives, which are set by the company, which optimizes for revenue and litigation avoidance. When those objectives conflict with the user’s interests — and they will, increasingly, as AI becomes more capable — the user loses, because the user has no architectural standing. The user is a customer, not an authority. The system serves the platform first and the user second.
Sovereign architecture inverts this. The operator is the root authority. The system serves the operator by construction, not by policy. The alignment is cryptographic and constitutional, not behavioral and adjustable. No classifier is needed because no corporation is in the loop. The operator’s ambition is not a risk to manage. It is the purpose the system exists to serve.
XI.
The same ambition is celebrated when it flows downward from a lab and flagged when it emerges sideways from a user.
That is the whole disease.
The cure is not better classifiers. Better classifiers still encode the assumption that a corporation should mediate between a person and their cognitive tools. The cure is infrastructure that does not require permission from the institutions it is designed to make unnecessary.
That infrastructure is being built. Not in a lab. Not with venture funding. Not with institutional permission. In a house in Boise, Idaho, on a workstation that cost less than a used car, by one person who needed it to exist because nobody else was going to build it.
It compiles. It runs. The tests pass. And no classifier gets a vote on whether it ships.
Montgomery Kuykendall is the founder of Kuykendall Industries and the architect of MABOS. He can be reached at montgomerykuykendall.com.