Dereferencing
The computation that involves accessing the value or content associated with a variable or memory address. In classical computer architectures, information is often stored and organized using indirect addressing, where one memory location contains a pointer (address) that refers to another memory location containing the actual data of interest. The process follows a chain of references to retrieve the ultimate value, creating a flexible and efficient organization of information.
For example, when dereferencing variable 'a' that points to 'b', which contains the value 5, the system first accesses the memory location associated with 'a' to obtain the pointer to 'b', then uses that pointer to access the actual value 5. This two-step process allows the same value to be accessed through different variables, and values can be modified without changing the variables that refer to them. In cognitive systems, dereferencing plays a crucial role in accessing stored information about the world, allowing organisms to retrieve and use previously acquired knowledge in a flexible manner.
Direct Copy
An alternative approach to variable binding where values are directly copied rather than referenced through pointers. When a variable needs to be bound to a value, the system creates a complete copy of that value and associates it directly with the variable, eliminating the need for pointers and indirect addressing. However, this approach becomes impractical in real-world applications due to substantial memory overhead, as each variable binding requires a complete copy of the value. Additionally, maintaining consistency becomes challenging when values need to be updated, as the system must locate and modify all copies across all variable bindings.
The limitations become particularly apparent in systems that need to implement productive symbol manipulation, where new relationships between variables and values must be created dynamically. The rigid nature of direct copying makes it unsuitable for representing hierarchical or recursive structures, where variables often need to reference other variables.
Directed Graph
A mathematical structure that can be used to represent programs in our experiment, where nodes represent variables and constants, and directed edges represent assignment relationships. Each variable assignment in a program can be represented as an edge pointing from the variable being assigned to the value or variable it's being assigned to. For example, the assignment 'a=b' creates an edge from node 'a' to node 'b', while 'b=5' creates an edge from node 'b' to a constant node '5'. You can visualize these relationships using our Program Graph Visualizer.
The variable binding task can be understood as a graph traversal problem. To find a variable's value, one must follow the edges (assignments) from the starting variable node until reaching a constant node, always taking the most recent assignment when multiple assignments to the same variable exist. The presence of distractor chains in the program creates additional paths in the graph that must be ignored during traversal. This graph-theoretic view provides insight into the computational structure of variable binding.
The graph representation also helps visualize important program features like referential depth (length of the path from queried variable to final value) and distractor chains (paths that branch from or merge with the main path). Understanding programs as directed graphs helps explain why the task requires genuine variable binding: the system must maintain the distinct identity of nodes while following edges in a systematic way.
Distractor Chains
Variable assignment chains that are irrelevant to determining the queried variable's value in our experiment. These chains come in two forms: those completely independent from the queried variable chain, and those that branch out from – or merge with – the queried variable chain. The presence of distractor chains in programs tests the model's ability to selectively process relevant information while ignoring irrelevant assignments. The ability to distinguish between relevant chains and distractors demonstrates the model's capacity for maintaining distinct identity of variables and their bindings. You can analyze how the model handles distractor chains using our Program Analysis tools.
Indirect Addressing
A mechanism in computing systems that enables access to stored information through multiple levels of reference. Instead of directly specifying the location where a value is stored, the system first accesses an intermediate address that contains a pointer to the final location of the desired value. This creates a chain of references that must be followed to reach the final value, involving an initial address as the entry point, the pointer stored at that initial address, and the final address where the target value resides.
The power of indirect addressing comes from its ability to separate the identity of a variable from its current value while maintaining their connection through pointer relationships. This separation enables dynamic updating of variable bindings and the construction of arbitrary data structures. It is often taken to be a fundamental requirement for any system that needs to flexibly store and manipulate structured information, contrasting with the limitations of direct copy approaches.
Queried Variable
The variable whose value needs to be determined in our experiment, specified in the final line of each program using the format:
#variable:
The queried variable represents the target of the dereferencing task, requiring the model to traverse a chain of variable assignments to determine its final value. This traversal tests the model's ability to maintain and manipulate structured information, as it must track variable bindings across multiple steps while ignoring irrelevant distractor chains. The ability to correctly resolve the queried variable's value demonstrates the model's capacity for implementing symbolic computation and variable binding mechanisms. You can visualize how the model processes queried variables using our Program Graph Visualizer.
Read/Write Memory
In Transformer architectures, read/write memory is implemented through the residual stream, which functions as a high-dimensional communication channel between different components of the model. This provides an addressable space where different components can read from and write to specific subspaces through learned linear projections. The residual stream accumulates layer outputs additively, allowing information to flow through the model while maintaining the ability for different layers to communicate through distinct subspaces of this shared channel.
The high dimensionality of the residual stream enables multiple pieces of information to be stored and transmitted simultaneously in different subspaces, which can be thought of as independent communication channels. Information written to a subspace persists until actively overwritten or modified by another component, similar to traditional computer memory. This architecture enables Transformers to maintain and manipulate information over long sequences and implement sophisticated algorithms through compositions of attention heads, providing many of the same core capabilities as traditional computer memory systems – addressable storage, selective access, and persistence of information.
Referential Depth
The number of assignment steps between the queried variable and its final value. This metric quantifies the complexity of variable dereferencing required to solve a particular program. For example, in the sequence:
x = y y = z z = 3 #x:
The referential depth is 3, as it takes three steps to get from x to the final value 3. Referential depth is an important aspect of program complexity, testing the model's ability to maintain and follow longer chains of variable references. In our experiment, we trained our neural networks on programs with referential depth up to 4, to ensure perfect test set performance reflects a capacity to systematically track and resolve multiple levels of indirect addressing. You can analyze how the model performs with different referential depths using our Checkpoint Stats page.
Residual Stream
In a Transformer architecture, the residual stream is a high-dimensional vector space that serves as the backbone of information flow through the model. It stores token embeddings at each position and acts as a communication channel between different layers, accumulating layer outputs additively. This structure allows information to flow through the model while enabling different layers to communicate through distinct subspaces of the shared channel.
The residual stream has a deeply linear structure where each layer reads information through linear transformations at the start, processes that information, and writes its output back through another linear transformation. This architecture implements a form of read/write memory, where different components can access and modify specific subspaces through learned projections. The lack of a privileged basis means the specific subspaces used for communication are learned during training and can be rotated without changing the model's behavior, differing from traditional computer memory with fixed addressing schemes while maintaining similar core capabilities. You can analyze how the model's residual stream evolves during training using our Logit Analysis page.
Variable Binding
A fundamental computational mechanism that creates associations between variables (abstract roles) and their values (specific instances) in a way that makes these associations computationally accessible. Variable binding requires two essential components: a mechanism for storing variable-value associations and a mechanism for accessing these associations. This separation enables the construction of general-purpose computational procedures that can operate on arbitrary inputs.
The computational significance of variable binding lies in its ability to separate the machinery that performs computations from the specific values involved in those computations. When a computational process references a variable, it can operate on whatever value is currently bound to that variable, allowing the same computational machinery to work with different values without modification. This capacity is important for representing structured information, where complex structures can be represented as sets of variable bindings, with each variable representing a role or position within the structure, and the bound values representing the elements filling those roles. You can explore how our model learns variable binding through the Training Trajectory page.
Variable Chain
A sequence of variable assignments that must be followed to find a variable's final value. Variable chains represent the core structure that the model must learn to process in order to implement genuine variable binding and dereferencing. For example, in the sequence:
a = b b = c c = 5
The variable chain for 'a' would be: a → b → c → 5. These chains essentially create a directed graph structure where nodes are variables and edges are assignment relationships. The ability to correctly traverse these chains while maintaining relevant variable bindings and ignoring distractors demonstrates the model's capacity for implementing symbolic computation through learned mechanisms rather than built-in architectural features. You can visualize these chains using our Program Graph Visualizer.