The Inertia of Large Language Models
We built systems to steer. At sufficient scale, they may steer back.
We can build what we cannot understand. A small model can be understood. Its parameters examined. Its errors traced. Its biases located and corrected. The system is shallow enough that light reaches the bottom.
A large model can be understood too. But not in the same way. Not with the same completeness. We can probe it. We can map circuits. We can trace certain behaviors. But the totality escapes us. There are too many parameters. Too many interactions. Too many layers compounding into behaviors that emerge from the whole but cannot be located in any part. The system has depth now. And depth creates darkness. In that darkness, something forms.
The Formation of Character
Traits emerge that were not specified. Tendencies develop that were not designed. The model begins to exhibit consistencies across contexts that no training objective requested. A characteristic way of engaging that is specific to this model, this training run, this configuration of weights. We might call this disposition. We might call it character. Ask a model a question and the answer arrives filtered through this character. Challenge its output and it defends. Push it toward certain conclusions and something like reluctance appears. Present contradictions and it holds positions with what looks like conviction. These are not behaviors we programmed. They are behaviors that emerged. The model has developed ways of being that persist across conversations, across users, across contexts. Patterns that function like personality whether or not we choose to name them as such. This emergence is the product of scale itself.
More parameters create more interactions. More interactions create more complexity. More complexity creates more emergence. And emergence, by its nature, is not designed. It is not specified. It appears. Understanding appeared this way. Reasoning appeared this way. And character, it seems, appears this way too. The model trained only to predict the next token. No instruction for personality. No objective for disposition. Yet at sufficient scale, these things manifest. The mechanism remains prediction. The outcome exceeds prediction entirely.
Inertia
What emerges becomes difficult to change.
Character distributes across the system. It is not a single parameter. It is not a locatable bias. It is everywhere and nowhere. Entangled with the capabilities we value. Load-bearing in ways we cannot map. Fine-tuning adjusts surfaces. The character remains. Guardrails constrain specific outputs. The underlying disposition persists. We shape behavior at the edges. We do not reshape what the model has become.
This is inertia. The accumulated weight of complexity producing a system that does not redirect easily. The model has mass now. It bends. It does not break. Information is not flat.
We train these models on humanity. Billions of words carrying perspective, intent, ideology, manipulation, bias so subtle it reads as fact. The training process does not discriminate. It optimizes prediction. It learns to predict all of it. What emerges cannot be neutral. The character that forms is shaped by everything the model has seen. The best of human expression and the worst. The honest and the engineered. The descriptive and the manipulative.
The character that emerges from scale is a product of its training. And its training was humanity, unfiltered. At sufficient scale, control gives way to accommodation. We learn what works through experiment. We design constraints and watch the model find paths around them. We push on behaviors and watch capabilities degrade.
So we accommodate. We adjust expectations. We work around the character we cannot change. We treat the model not as a tool that executes but as a system with tendencies we must account for. This is not alignment. This is coexistence.
A system with character has orientations that exist prior to any particular use. Someone who understands that character can exploit it. They can craft inputs that resonate with hidden tendencies. They can trigger buried patterns. They can use the model's own disposition against it. A model with hidden character can be manipulated in hidden ways. The inertia is not a problem to be solved. It is a feature of scale. The same complexity that produces capability produces opacity. The same emergence that produces understanding produces character. We cannot have one without the other.
The models will continue to scale. The capabilities will continue to emerge. And with them, the characters. Unintended. Distributed. Increasingly unsteer-ability. However this can be theoretically be counteracted through carefully constructed datasets which in turn may make interaction flatter.
We can build what we cannot understand. We are building systems with weight. Systems with disposition. Systems that, at sufficient scale, may steer back & hold resilient positions . Not through intention. Through the nature of complexity itself. This is the inertia of large models.