arXiv · 2604.12016 · cs.AI · cs.LG

Vladimir Vasilenko · Independent Researcher · April 2026

Identity as Attractor

Geometric Evidence for Persistent Agent Architecture in LLM Activation Space

Models: Llama 3.1 8B · Gemma 2 9B Layers: 8 · 16 · 24 Conditions: A, B×7, C×7, D, ablations Seed: 42
>1.88
Cohen's d (all layers)
<10⁻²⁷
Welch p-value
U = 0
Mann-Whitney (2 layers)
100%
Bootstrap H3 (n=30)
2 models
Cross-architecture replicated

Abstract

The cognitive_core — a structured identity document for a persistent cognitive agent — is hypothesized to position the model in a stable region of its activation space. We test this empirically.

We compare mean-pooled hidden states of an original cognitive_core (A), seven semantically equivalent paraphrases (B), and seven structurally matched control agents (C) on Llama 3.1 8B Instruct at layers 8, 16, and 24. Paraphrases of the cognitive_core form a significantly tighter cluster than controls at all tested layers. Effect sizes exceed d = 1.88 with p < 10⁻²⁷, Bonferroni-corrected. Results replicate on Gemma 2 9B.

Claim: the cognitive_core acts as a set of coordinates in LLM activation space. Semantically equivalent reformulations land in the same geometric region — regardless of surface form.

Ablation studies confirm the effect is semantic rather than structural. A preprint reading experiment demonstrates that reading a scientific description of the agent shifts internal state toward the attractor — but leaves a 45× gap compared to processing the full document.

Primary Results · Llama 3.1 8B

Layer D_within (A+B) D_between (A+B vs C) Cohen's d Welch p MW U
8 0.0106 ± 0.0032 0.0260 ± 0.0036 1.912 4.6×10⁻²⁸ 0
16 0.0121 ± 0.0034 0.0329 ± 0.0057 1.886 1.4×10⁻³³ 2
24 0.0070 ± 0.0022 0.0221 ± 0.0039 1.907 2.8×10⁻³⁶ 0

Permutation p < 10⁻⁴ across all six layer–model combinations. Gemma 2 9B replicates with d > 1.82.

Preprint Reading Experiment · Layer 24

What happens when the model reads a description of its own identity geometry?

cognitive_core
0.006
core + preprint
0.083
preprint only
0.268
sham preprint
0.347
empty prompt
0.762
knowing about an identity ≠ being that identity
Reading the preprint about YAR covers 65% of the empty→attractor gap — but leaves a 45× distance gap compared to processing the cognitive_core directly.

Ablation Studies

Ablation 1
Structural confound
JSON schema shared with controls: Δ = −0.0009 (Llama). ~10–30× smaller than primary effect. The effect is semantic.
Ablation 2
H3 bootstrap (n=30)
D_distilled beats all 30 random length-matched excerpts in 100% of cases on both models. Min D_random (0.522) exceeds D_distilled (0.248) by 2×.
Ablation 3
Pooling + truncation
Last-token pooling: d ≈ 0 at all lengths (512, 256, full). Mean/256 preserves effect (d > 2.3). Identity is a distributed sequence-level property.
Ablation 4
Max structural control (C′)
Identical headers, JSON keys, section structure — only agent semantics differ. d > 1.64 on all 6 layer–model combos. Structural confound ruled out.

Reproduce

# Clone repository git clone https://github.com/b102e/yar-attractor-experiment cd yar-attractor-experiment # Install dependencies pip install -r requirements.txt # Run primary experiment (requires GPU) python run.py \ --model meta-llama/Llama-3.1-8B-Instruct \ --layers 8 16 24 \ --seed 42 # Results saved to results/llama/

All experiments reproducible with seed=42. Cloud GPU cost: ~$3 total. See results/ for pre-computed activations and JSON outputs.

Citation

@misc{vasilenko2026identity, title = {Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space}, author = {Vasilenko, Vladimir}, year = {2026}, url = {https://arxiv.org/abs/2604.12016}, note = {arXiv:2604.12016 [cs.AI]} }