FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

Abstract

We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars with detailed dynamic deformation from single or sparse images, without requiring camera poses or expression labels. It leverages a transformer-based reconstruction model with structured head query tokens as canonical anchor to aggregate flexible input-number-agnostic, camera-pose-free and expression-free inputs into a robust canonical 3D representation. For detailed dynamic deformation, we introduce a lightweight UNet decoder conditioned on UV-space position maps, which can produce detailed expression-dependent deformations in real time. To better capture rare but critical expressions like wrinkles and bared teeth, we also adopt a data distribution adjustment strategy during training to balance the distribution of these expressions in the training set. Moreover, a lightweight 10-second refinement can further enhances identity-specific details in extreme identities without affecting deformation quality. Extensive experiments demonstrate that our FlexAvatar achieves superior 3D consistency, detailed dynamic realism compared with previous methods, providing a practical solution for animatable 3D avatar creation.

Method Overview

FlexAvatar reconstructs a high-quality Gaussian head avatar by mapping input images with varying expressions and camera views into Gaussian representations in UV space, through the following steps:

Use a flexible feed-forward backbone to obtain static Gaussian maps and identity feature map from input images.
Convert driving expression signal into a FLAME UV position map and concatenate with identity features.
Feed concatenated representation into a UNet to generate dynamic Gaussian attributes.
then sample into FLAME space with LBS for rendering.
Apply optional efficient refinement to improve results.

Ultimately, our proposed FlexAvatar produces detailed, real-time 360° reenactment renderings.

FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

¹Tsinghua University

²ByteDance

Abstract

Method Overview

Video

FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

1 Tsinghua University

2 ByteDance

Abstract

Method Overview

Video

¹Tsinghua University

²ByteDance