Cassidy Laidlaw, Eli Bronstein, Timothy Guo, Dylan Feng, Lukas Berglund, Justin Svegliato, Stuart Russell, and Anca Dragan
ICML 2025
We develop a scalable approach to solving assistance games, which are an alternative paradigm to RLHF for training helpful and harmless assistants. We demonstrate our approach in a new environment, the Minecraft Building Assistance Game (MBAG), where an assistant helps a user build a house that is unknown to the assistant. Here, we provide code for MBAG and the experiments in the paper. We also present videos from our human study.
Assistance games are an alternative paradigm to RLHF for developing helpful and harmless AI assistants. In RLHF (top), an assistant policy is trained to take in the environment state (e.g., human chat messages) and produce an action (e.g., a response message). The assistant policy is trained to maximize a reward function which is learned from human feedback. In contrast, in assistance games (bottom), the human is assumed to be another agent acting in the same environment as the assistant, rather than an exogenous source of feedback. The human and assistant share a reward function, but it depends on reward parameters that are initially known only to the human.
The code for MBAG and the experiments in the paper is available at https://github.com/cassidylaidlaw/minecraft-building-assistance-game. The README at that link contains instructions for running our assistants in MBAG.
All videos are shown at 2x real time.
Our assistant trained for MBAG using assistance games learns to exhibit a variety of behaviors that are helpful to users. Here, we show three examples of emergent behaviors from AssistanceZero, our algorithm for solving assistance games. These are the same three examples shown in Figure 1 of the paper.
Digging a foundation: the assistant watches the human outline the house's foundation. Then, the assistant breaks blocks within the outline and they finish the foundation together.
Building a roof: the assistant watches the human start building the roof of the house. Then, the human is able to work on other parts of the house while the assistant continues working on the roof.
Learning from corrections: the assistant has built the stone walls of the house one block too tall. The human breaks one of the incorrect blocks. The assistant learns from its mistake and helps the human break the remaining incorrect blocks.
In our human study, sixteen participants built houses under four conditions: building alone, building with our AssistanceZero assistant, building with an expert human assistant, and building with a supervised fine-tuning (SFT) assistant. Each participant was assigned a different house but built the same house under all four conditions. The assistants could not directly see the goal structure during the study; only the participants were able to see their goal as a transparent blueprint while they played. None of the assistants, including the human assistant, had previously seen the houses. Here, we show two participants from the study under all four conditions.
No assistant
AssistanceZero assistant
Expert human assistant
SFT assistant
No assistant
AssistanceZero assistant
Expert human assistant
SFT assistant