Unleashing Guidance Without Classifiers for Human-Object Interaction Animation

1University of Illinois Urbana-Champaign, 2Snap Inc.
ICLR 2026
TL;DR: LIGHT generates realistic human-object interaction animations by denoising different components of the motion at different speeds, so cleaner components naturally guide noisier ones - producing contact-aware guidance without any external classifiers or hand-crafted priors.

Abstract

Generating realistic human-object interaction (HOI) animations remains challenging because it requires jointly modeling dynamic human actions and diverse object geometries. Prior diffusion-based approaches often rely on handcrafted contact priors or human-imposed kinematic constraints to improve contact quality. We propose a data-driven alternative in which guidance emerges from the denoising pace itself, reducing dependence on manually designed priors. Building on diffusion forcing, we factor the representation into modality-specific components and assign individualized noise levels with asynchronous denoising schedules. In this paradigm, cleaner components guide noisier ones through cross-attention, yielding guidance without auxiliary classifiers. We find that this data-driven guidance is inherently contact-aware, and can be further enhanced when training is augmented with a broad spectrum of synthetic object geometries, encouraging invariance of contact semantics to geometric diversity. Extensive experiments show that pace-induced guidance more effectively mirrors the benefits of contact priors than conventional classifier-free guidance, while achieving higher contact fidelity, more realistic HOI generation, and stronger generalization to unseen objects and tasks.

Approach Overview

Method Overview

Overview of LIGHT. Left: Training. We form different modalities, e.g., body, hand, and object, each diffused with its own noise level. Right: Inference. We compare a uniform schedule that denoises all modalities synchronously with a staged schedule that keeps one modality cleaner from the uniform run.

Gallery of Generation

Ablation on the Augmentation

Ablation on the Guidance

BibTeX

@inproceedings{wang2026unleashing,
      title = {Unleashing Guidance Without Classifiers for Human-Object Interaction Animation},
      author = {Wang, Ziyin and Xu, Sirui and Guo, Chuan and Zhou, Bing and Gong, Jiangshan and Wang, Jian and Wang, Yu-Xiong and Gui, Liang-Yan},
      booktitle = {ICLR},
      year = {2026}
    }