[↗] GIDS series

preprint · part 3 · 4 of 7

Part 3: Application — Predicting How People Behave

Part 2 established the person-side object. It distinguished the inherited template from the realized individual embedding, the realized embedding from the momentary phenomenal state, and the momentary phenomenal state from the task-conditioned predictive state the mathematics can actually handle. Part 3 now asks what can be done with that object.

The claim of this section is narrower than a final proof about mind. It is not that we already possess the exact universal transition law for human beings. It is that we can define a mathematical program in which observer, role, proposition, and environment can be placed into a shared representational space, and that inside this space we can iteratively improve our estimate of how one predictive observer-state gives rise to the next. In other words: this section does not complete the science; it specifies the playground in which the science can be built.

Everything the model computes on ends up as vectors or tuples of vectors, however not every symbol in the paper is itself a primitive vector before preprocessing. Raw inputs can be text, categories, histories, metadata, and context objects. Before they hit the learned equations, they should be converted into vector/tensor representations, a neat trick is to just use your favorite multi-input embedding model for any basic operations. For anything to do with categorical vectors, like “life-history” for example, you can just average these embeddings together to come up with a decent latent representation: Beware, if you implement the averaging trick incorrectly your model can become hot garbage.

Towards a Universal State Transition Function

In the ideal philosophical form of the theory, the object of interest is still the next phenomenal state.

Let

$$ \phi_{i,t} $$

denote the full phenomenal state of person \(i\) at time \(t\), and let

$$ T_i $$

denote the realized transcendental embedding of that person. Let

$$ x_t $$

denote the proposition confronting the observer at time \(t\). Here “proposition” is meant broadly. It may be a sentence, an email, a product, a person, a meeting, a threat, a market signal, or a whole local arrangement of circumstances. For the observer, what matters is not bare matter but presented structure.

The ideal transition law is therefore

$$ \phi{i,t+1} = F(T_i, \phi{i,t}, x_t). $$

I keep this equation because it is the motivating ideal. I do not keep it as the formal target of the benchmark, because that would be me quietly pretending I have direct access to the thing I explicitly said I do not have.

So the formal target from here on is the task-conditioned predictive observer-state.

Let

$$ \mathcal U = \{u=(u_1,u_2,\dots)\} $$

be an ambient infinite coordinate space of possible distinctions. Nothing in the present argument requires stronger structure than this. The point of \(\mathcal U\) is simply to provide a common representational arena large enough to host observer, proposition, role, and world.

For any prediction task \(\tau\), define a finite projection

$$ \Pi\tau:\mathcal U \to \mathbb R^{d\tau}, $$

where \(d_\tau\) is the number of coordinates needed for that task. This preserves the core idea of infinite dimensional space without requiring infinite computation. The ambient space is open-ended; each task lives in a finite slice of it.

The formal state is

$$ q_{i,t}^{(\tau,\Delta)}, $$

chosen so that, for admissible propositions,

$$ P\!\left( Y{i,t+\Delta}^{(\tau)} \mid H{i,\le t}, Ti, c{i,t}, wt, x_t \right) = P\!\left( Y{i,t+\Delta}^{(\tau)} \mid q_{i,t}^{(\tau,\Delta)}, x_t \right). $$

The measurable approximation is

$$ s{i,t}^{(\tau,\Delta)} = (\hat T_i, z{i,t}, c_{i,t}, w_t). $$

The operational transition law is then

$$ \hat q{i,t+1}^{(\tau,\Delta)} = F\tau(\hat Ti, z{i,t}, c_{i,t}, w_t, x_t; \theta), $$

and the observable readout is

$$ \hat y{i,t+\Delta}^{(\tau)} = R_0^{(\tau)}\!\left(\hat q{i,t+1}^{(\tau,\Delta)}\right). $$

If we want to keep the stronger language without lying to ourselves, the clean interpretation is this: the next predictive observer-state is the formally accessible stand-in for the part of phenomenal transition that matters to the task.

It is useful to decompose the transition map into three pieces. First, encode the observer-side object:

$$ o{i,t}^{(\tau)} = E_o^{(\tau)}(\hat T_i, z{i,t}, c_{i,t}, w_t). $$

Second, encode the proposition:

$$ p_t^{(\tau)} = E_p^{(\tau)}(x_t). $$

Third, compute the interaction:

$$ h{i,t}^{(\tau)} = \Psi\tau(o_{i,t}^{(\tau)}, p_t^{(\tau)}), $$

and decode the next predictive state:

$$ \hat q{i,t+1}^{(\tau,\Delta)} = G\tau(h_{i,t}^{(\tau)}). $$

Finally, if the task demands an observable output or auxiliary readouts, decode them from the predictive state:

$$ \hat y{i,t+\Delta}^{(\tau)} = R_0^{(\tau)}(\hat q{i,t+1}^{(\tau,\Delta)}), \qquad \hat a{i,t+\Delta}^{(m,\tau)} = R_m^{(\tau)}(\hat q{i,t+1}^{(\tau,\Delta)}). $$

Whenever I write \(\hat a*{i,t+\Delta}^{(\tau)}\) without the probe index \(m\), I mean the full auxiliary bundle \((\hat a*{i,t+\Delta}^{(1,\tau)}, \dots, \hat a_{i,t+\Delta}^{(M,\tau)})\).

This last step matters. The reply, the purchase, the rejection, the delay, the meeting, or the concession is not the state itself. It is a visible residue of the transition. The model is not fundamentally about “will they buy?” It is about “what state will they enter next for the purposes of this task?” The purchase is then one possible readout from that state.

God’s Infinite Dimensional Space: Making All Realities Composable

“To make all realities composable” does not mean every distinction matters for every observer. It means that anything capable of altering the observer’s next predictive state can be represented within the same formal program.

The observer remains the privileged object. The world matters only insofar as it enters the observer’s state transition.

If two propositions differ physically but not in the task-relevant projection available to the observer, then they are equivalent for that task. Formally, if

$$ \Pi{\tau}(E_p(x_t^{(1)})) = \Pi{\tau}(E_p(x_t^{(2)})), $$

then, for fixed \(\hat Ti\), \(z{i,t}\), \(c_{i,t}\), and \(w_t\),

$$ F\tau(\hat T_i, z{i,t}, c{i,t}, w_t, x_t^{(1)}; \theta) \approx F\tau(\hat Ti, z{i,t}, c_{i,t}, w_t, x_t^{(2)}; \theta). $$

This captures a number of cases at once. Dark matter may exist in the ambient space and yet project to nothing for the ordinary human observer. A table made of wood and a table made of stone may differ in physical constitution, yet remain interchangeable for a task in which that distinction never enters the observer’s next state. The world is therefore not composable because everything is the same; it is composable because differences can be represented, weighted, or nullified within one algebra.

The same framework scales across levels. An amoeba may be modeled with only a handful of coordinates: nutrient, toxin, gradient, rupture, motion. A dog adds more. A human adds far more. A corporation, later on, may be treated as a higher-order observer if it can be represented as a system with memory, incentives, internal communication, and persistent response surfaces. The universal claim is therefore about form, not about one fixed content-size. For the present paper, however, we restrict the application to humans situated within roles.

At the human level, the person-side estimate begins with standardized features. Let

$$ u_i = [p_i \mid b_i \mid \ell_i \mid r_i \mid h_i \mid g_i^{\mathrm{grp}}], $$

where \(p_i\) denotes psychometric and cognitive proxies, \(b_i\) biography, \(\ell_i\) language and cultural position, \(r_i\) role and institution history, \(h_i\) life-event structure, and \(g_i^{\mathrm{grp}}\) longer-run group or company variables relevant to the person. Let \(g_i^{\mathrm{slow}}\) denote the slow source-aware and regime-aware categorical trace bank built from repeated role labels, objection classes, action-types, topical recurrences, counterpart identities, and similar discrete observations. Then

$$ \hat Ti = E\theta(u_i, g_i^{\mathrm{slow}}) $$

is the first operational estimate of the transcendental embedding.

This estimate is not final. It is the opening move in a discovery process. The correct coordinates for a task are not assumed in advance; they are tested.

For any candidate feature family \(f\), define its contribution to a task \(\tau\) as

$$ \Delta\tau(f) = \mathrm{Perf}\tau(M \cup f) - \mathrm{Perf}_\tau(M), $$

where \(M\) is the current model and \(\mathrm{Perf}_\tau\) is its performance on the task. If a feature family raises predictive validity in a stable way, it remains. If it does not, it is removed. The theory therefore does not begin by declaring the final human embedding solved. It begins by specifying the logic by which that embedding is approximated and improved.

This is why the first operational analogue is closer to a recommender system than to a completed metaphysics. A recommender infers a user representation from traces, maps a candidate object into a related space, predicts the next response, and updates from feedback. The present framework generalizes that move: infer a person representation, maintain an explicit local state, map a proposition into the same task-space, estimate the next predictive observer-state, and update the estimate from observed consequences.

Creating the World Model

Figure 5. Proposed latent-state world model with slow person embedding, fast recurrent state, proposition encoder, interaction state, and readout heads. A world model, in the present sense, is not a copy of physical reality. It is a learned simulator of predictive-state transitions under propositions.

For a task \(\tau\), define the world model as

$$ \mathcal W\tau:(\hat T_i, z{i,t}, c{i,t}, w_t, x_t)\mapsto \hat q{i,t+1}^{(\tau,\Delta)}. $$

This is the single-step form. Multi-step rollout is obtained by recursion:

$$ \hat q{i,t+k}^{(\tau,\Delta)} = \mathcal W\tau^{(k)}(\hat Ti, z{i,t}, c{i,t}, w_t, x_t, x{t+1}, \dots, x_{t+k-1}). $$

The model can later be extended to multiple agents by replacing the single observer with a set of observer-state tuples, but the present paper keeps one observer at the center because that is enough to ground the framework.

The first application domain is go-to-market interaction because it produces repeated transitions, clear timestamps, and measurable outcomes. Consider a founder receiving a sales pitch. The founder is not just a person. The founder is a person-in-role, with a company, a market position, a prior history, a threat model, a time horizon, and live incentives. The proposition is not just the seller’s sentence. It includes the seller, the product, the message, the channel, the price, the timing, the current market state, and the problem-frame through which the product is being introduced.

In this case, the person-side object is

$$ o{i,t}^{(\tau)} = E_o^{(\tau)}(\hat T_i, z{i,t}, c_{i,t}, w_t), $$

where \(c_{i,t}\) includes the founder role and relevant company state. The proposition-side object is

$$ p_t^{(\tau)} = E_p^{(\tau)}(x_t), $$

where \(x_t\) includes the seller, product, pitch, and local conditions. Their interaction produces the next predictive state, and that state can be read out as reply likelihood, meeting likelihood, objection class, cycle delay, purchase likelihood, sentiment shift, or any other observable the task legitimately carries.

In that sense, the eventual sales outcome is not the main object. It is the visible consequence of a prior state transition. If the email changed salience, trust, urgency, or perceived fit, then the transition already occurred before the purchase did.

Nothing in this framework commits us to one architecture. The transition operator \(\Psi_\tau\) may be implemented by a linear model, a recurrent model, an attention-based model, a state-space model, or a neural system that combines several of these. Sequence models in the Mamba family are one candidate because they compress long histories into an evolving state, but they do not define the theory. They are implementation options inside it.

What matters most at the outset is not architectural ambition but a working procedure.

Algorithm 1: Estimate the person-side object

Choose a task \(\tau\) and a time horizon \(\Delta\). Collect standardized person-level observations and build the slow categorical trace bank \(g_i^{\mathrm{slow}}\). Estimate

$$ \hat Ti = E\theta(u_i, g_i^{\mathrm{slow}}). $$

Construct the current fast state \(z*{i,t}\) from event history together with the active categorical pool \(g*{i,t}^{\mathrm{fast},\tau}\), then attach the active role-context \(c_{i,t}\) and world state \(w_t\). This produces the operational state

$$ s{i,t}^{(\tau,\Delta)} = (\hat T_i, z{i,t}, c_{i,t}, w_t). $$

Algorithm 2: Encode the proposition and predict the transition

Encode the proposition \(x_t\) into

$$ p_t^{(\tau)} = E_p^{(\tau)}(x_t). $$

Compute the interaction

$$ h{i,t}^{(\tau)} = \Psi\tau(o_{i,t}^{(\tau)}, p_t^{(\tau)}), $$

then decode the next predictive state

$$ \hat q{i,t+1}^{(\tau,\Delta)} = G\tau(h_{i,t}^{(\tau)}), $$

and, when needed, produce observable readouts

$$ \hat y{i,t+\Delta}^{(\tau)} = R_0^{(\tau)}(\hat q{i,t+1}^{(\tau,\Delta)}), \qquad \hat a{i,t+\Delta}^{(m,\tau)} = R_m^{(\tau)}(\hat q{i,t+1}^{(\tau,\Delta)}). $$

Algorithm 3: Update from error

Observe the realized outcome \(y*{i,t+\Delta}^{(\tau)}\) and auxiliary probes \(a*{i,t+\Delta}^{(m,\tau)}\). Compute the training objective

$$ \mathcal L\tau = \mathcal L{\mathrm{main}}
  • \sum{m=1}^{M}\lambda_m \mathcal L{\mathrm{probe},m}
  • \lambda_{\mathrm{reg}}\Omega(\theta). $$

This is the ordinary minimize-by-gradient-descent version of the update. The probe heads are there to force reusable structure into the predictive state rather than letting the model survive on one brittle headline target, and the regularizer is there to keep the fit honest instead of rewarding parameter sprawl.

Update parameters by gradient step,

$$ \theta{t+1} = \theta_t - \eta\nabla\theta \mathcal L_\tau, $$

or, in Bayesian form,

$$ p(\theta\mid D{1:t+1}) \propto p(D{t+1}\mid \theta)\,p(\theta\mid D_{1:t}). $$

The purpose of the update is not only to improve the transition operator. It is also to refine the estimated embedding, the fast state update, the task projection, and the feature family retained for the task.

Algorithm 4: Detect drift and reopen discovery

No implementation should be assumed stable forever. If the environment changes, if incentives shift, or if a once-inert distinction becomes active, model quality will decay. Let recent performance be compared against reference performance. When the drop is persistent, the current projection is no longer sufficient. At that point the system must reopen the discovery process: add candidate features, reweight existing ones, or rebuild the task projection.

This is the correct sense in which the framework is open-ended. The framework is not falsified by a weak first implementation; specific implementations are. The framework supplies the lexicon and the update logic by which better implementations can be constructed.

This is the place where I stop pretending the point of the machinery is merely to admire prediction metrics.

The practical purpose of the framework is not only to forecast outcomes, but to compare admissible candidate propositions by their expected effect on the observer’s next task-relevant state and downstream objective. Otherwise why the hell are we building it.

Let \(\mathcal X_{i,t}^{\mathrm{adm}}\) denote the admissible candidate proposition set for observer \(i\) at time \(t\). “Admissible” here just means allowable under the task, channel, platform, and whatever policy constraints you are actually operating under.

Let \(U_\tau\) be the utility for task \(\tau\), defined on the predicted next state and its readouts. Then the model-induced score of a proposition is

$$ \operatorname{score}_\theta(x \mid s_{i,t}^{(\tau,\Delta)}) = \mathbb E\theta\!\left[ U\tau\!\left( \hat q{i,t+1}^{(\tau,\Delta)}, \hat y{i,t+\Delta}^{(\tau)}, \hat a{i,t+\Delta}^{(\tau)} \right) \mid s{i,t}^{(\tau,\Delta)}, x \right]. $$

The corresponding proposition search problem is

$$ xt^\star \in \arg\max{x \in \mathcal X{i,t}^{\mathrm{adm}}} \operatorname{score}\theta(x \mid s_{i,t}^{(\tau,\Delta)}). $$

That gives us three distinct regimes.

First, there is forecasting: estimate the next predictive state and readout of the proposition that actually happened.

Second, there is observational ranking: simulate or score counterfactual candidate propositions under the model. This is useful and often already valuable.

Third, there is interventional policy improvement: choose propositions based on those scores and claim they will improve outcomes in the real world. This third step is not free. It requires logged propensities, randomized exploration, or online experimentation. Without that, what you have is ranking, not causal control.

That distinction matters because this system is obviously drifting toward control. Better to say it cleanly than hide it in a footnote.

Closing Part 3

Part 1 argued that reality, as it appears to an organism, is not a mirror of noumena but the output of an evolved representational structure. Part 2 showed how that structure can be individualized, historically deformed, estimated from observable traces, and compressed into a task-conditioned predictive state. Part 3 completes the descent into mathematics by specifying the program that follows from those claims.

The ideal object remains the next phenomenal state. The practical object is a predictive observer-state. The observer, the role, the world, and the proposition can be embedded in a shared arena. Their interaction can be modeled as a transition. The resulting state can be rolled forward, compared against outcomes, decoded into auxiliary probes, and updated under error. This is the meaning of a transcendental world model in operational terms.

The framework can therefore be summarized in one line:

$$ (\hat Ti, z{i,t}, c{i,t}, w_t, x_t) \;\longrightarrow\; \hat q{i,t+1}^{(\tau,\Delta)} \;\longrightarrow\; (\hat y{i,t+\Delta}^{(\tau)}, \hat a{i,t+\Delta}^{(\tau)}) \;\longrightarrow\; \text{update}. $$

And if proposition search is turned on, the line extends to

$$ s{i,t}^{(\tau,\Delta)} \;\longrightarrow\; \operatorname{score}\theta(x \mid s_{i,t}^{(\tau,\Delta)}) \;\longrightarrow\; x_t^\star. $$

That is the whole ambition of this section. Not a complete algebra of mind, but a way to build one without lying about what has and has not been solved.

OK, it is time to get serious now.