Audio Overview (generated using NotebookLM)

Why Embodied Human is Important?

Generating virtual humans that can interact with the world in a natural and intuitive way is a challenging task, which has important applications in various fields, including virtual reality, simulation, and robotics. Traditional research in this domain has primarily focused on creating realistic appearances and replicating diverse human movements. However, these cannot define a truly virtual human. A virtual human requires cognitive abilities where the mind drives the body to interact with the environment, and in turn, these interactions shape the mind. This is our focus in this work.

The Challenges for Embodied Human

(1) Cognitive Architecture. How to simulate the complex, dynamic nature of a virtual human's internal mental states and their interplay with the external, ever-changing environment through the embodied actions?
(2) Embodied Action. How can the virtual human physically enact its intentions through its body, interact with the environment, and, in turn, allow these interactions to shape its mental states?

Project Abstract

Building virtual humans requires more than just realistic appearances and diverse movements; it necessitates simulating the intricate interplay between internal cognitive states and external environments, as framed by the concept of embodied cognition. In this paper, we propose an embodied cognitive architecture, EmbodiedHuman, that captures this interaction by integrating "Mind" - a structured cognitive module, with motor execution to drive the virtual human's behavior within an interactive 3D environment. To enable integrated embodiment over both cognitive states and physical execution, we introduce three novel modules: (1) a cognition-inspired Mind structure, which models high-level reasoning and decision-making through key causal variables (value, belief, desire, and intention); (2) an action execution module, which translates internal intentions into embodied movements, enabling physically grounded interactions; and (3) an exploration module, which empowers the agent to actively explore the environment and update its mental states through feedback of actions. Our approach allows virtual humans to continuously adapt, learn, and evolve their behavior in response to environmental changes, supporting dynamic and natural human-like interactions in the long horizon. Extensive experiments demonstrate the flexibility and scalability of our method in simulating individualized, daily-level behaviors in unknown environments. We hope our EmbodiedHuman can serve as a baseline prototype to advance research in embodied cognition and virtual human modeling.

Our Approach - EmbodiedHuman

EmbodiedHuman Framework Overview. To simulate a truly "virtual human" in an unfamiliar environment, we design an embodied cognitive architecture, EmbodiedHuman, shaping the mind with value, belief, desire, and intention, and coupling cognition with actions for embodied interaction. Starting from the value (virtual human profile) and the belief (dynamic scene graph), we leverage an LLM to generate temporary desires, which are translated into concrete intentions. Two interactive modules are proposed to enable embodied action execution and dynamic environment exploration.

Action Execution

Overview of the action execution module. Text controls the motion style of Global Motion Latent Diffusion (GMLD). The diffusion process is further guided by gradients from spatial constraints, where red curves indicate sparse joint locations (taking root and hand joints as examples) and green arrays represent global orientation.

Analysis of EmbodiedHuman

Value influences virtual human behavior. Two virtual humans (i.e., Ben and Rachel) tend to eat different foods (Chip or Tomato) according to their values.

Environment influences virtual human behavior. Armin chooses different assets (Laptop or Cellphone) to fulfill its desire based on its belief about the environment.

Scene dynamics influence virtual human behavior. The virtual human chooses to eat the Apple when the Tomato has been eaten.

Environmental Exploration. The virtual human actively explores the environment and updates its desire.

BibTeX

Website template modified from incredible UMI-On-Legs, NeRFies, Scaling Up Distilling Down, and AnyCar. This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.