Generating virtual humans that can interact with the world in a natural and intuitive way is a challenging task, which has important applications in various fields, including virtual reality, simulation, and robotics. Traditional research in this domain has primarily focused on creating realistic appearances and replicating diverse human movements. However, these cannot define a truly virtual human. A virtual human requires cognitive abilities where the mind drives the body to interact with the environment, and in turn, these interactions shape the mind. This is our focus in this work.
(1) Cognitive Architecture. How to simulate the complex, dynamic nature of a virtual human's internal mental states and their interplay with the external, ever-changing environment through the embodied actions?
(2) Embodied Action. How can the virtual human physically enact its intentions through its body, interact with the environment, and, in turn, allow these interactions to shape its mental states?
EmbodiedHuman Framework Overview. To simulate a truly "virtual human" in an unfamiliar environment, we design an embodied cognitive architecture, EmbodiedHuman, shaping the mind with value, belief, desire, and intention, and coupling cognition with actions for embodied interaction. Starting from the value (virtual human profile) and the belief (dynamic scene graph), we leverage an LLM to generate temporary desires, which are translated into concrete intentions. Two interactive modules are proposed to enable embodied action execution and dynamic environment exploration.
Overview of the action execution module. Text controls the motion style of Global Motion Latent Diffusion (GMLD). The diffusion process is further guided by gradients from spatial constraints, where red curves indicate sparse joint locations (taking root and hand joints as examples) and green arrays represent global orientation.
Value influences virtual human behavior. Two virtual humans (i.e., Ben and Rachel) tend to eat different foods (Chip or Tomato) according to their values.
Environment influences virtual human behavior. Armin chooses different assets (Laptop or Cellphone) to fulfill its desire based on its belief about the environment.
Scene dynamics influence virtual human behavior. The virtual human chooses to eat the Apple when the Tomato has been eaten.
Environmental Exploration. The virtual human actively explores the environment and updates its desire.
Website template modified from incredible UMI-On-Legs, NeRFies, Scaling Up Distilling Down, and AnyCar. This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.