State as Fixed-Size Context

Learn how to model game state, player view, and actions with Zod schemas and a DSL, eliminating context limits, unreliability, and rule enforcement issues.

Overview

LLM applications often rely on a long-running conversation context (including content from documents and the conversation with the user so far) to keep track of the information needed to respond to the user’s next requests sensibly.

This is very difficult problem when approached from the point of view of extracting the relevant information from a conversation context. One approach is RAG (retrieval augmented generation), where an embedding of the conversation context is queried for the most relevant snippets to include in the context for the next response to the user. Another approach is to maintain a rolling summary of the most important points about the conversation so far, and to include that summary into the context for the next response to the user. All of these types of approaches have two main issues:

unreliability: it’s impossible to guarantee that an important piece of info will be extracted from the context
non-scalability: while they improve scalability over including as much of the original conversation context as possible into the context used to generate a response, they still have worsening performance as the original conversation context scales

One LLM application I’ve been interested in, where these problems come up, is text adventure games. A well-known example of this kind of thing is AI Dungeon, which suffered from the unreliability, non-scalability problems, as well as admitting problems of:

ephemeral state: application state becomes distorted by whatever method is used to re-include it into context when necessary (if it is retrieved at all), making particular bits of information ephemeral over time
enforced rules: since everything is in terms of natural language, it is difficult for the application designer to enforce any rules on user behavior (even careful natural-language instructions are not enough, viz Lakera’s Gandalf prompting challenges)

In this presentation, I demonstrate an alternative approach that addresses all of these problems in building an LLM text adventure game. The central idea is to formalize the application state (e.g. the game world state) into structured data, formalize the user’s fixed-sized view of this state (e.g. their player information, inventory, and immediate surroundings), and formalize an enumerated set of actions (in the form of a DSL i.e. domain-specific language) that the user can take to affect the state (e.g. actions available to the player). All of these formalizations are in terms of Zod schemas with natural-language descriptions, especially taking advantage of schemas unions with enum tags e.g. const BooleanOrNumber = z.union([ z.object({ type: z.enum(["boolean"]), value: z.boolean() }), z.object({ type: z.enum(["number"]), value: z.number() }) ]);.

In this way, the LLM always has access to the most important information (which is fixed-size in terms of the number of conversation turns that user has taken) for deciding how to interpret the user’s prompt, the application state is maintained even when the player is only viewing a fragment of it, and the LLM can only interpret the user’s prompt into a DSL for interacting with the state, which lets the application designer decide the rules for interacting with the state in a way that is transparent to the user.

The presentation will be organized as follows:

Introduction of the problems: unreliability, non-scalability, ephemeral state, enforced rules
Demonstration of the text adventure game that implements some techniques that address these problems
Technical walkthrough of how the app implements these techniques, in particular the formalization of state, view, and action

Links

https://github.com/rybla/small-language-games
Next.js/TypeScript LLM experiments, developed and run with Bun.

Tech stack