This paper addresses metric 3D reconstruction of indoor scenes by exploiting their inherent geometric regularities with compact representations. Using planar 3D primitives -- a well-suited representation for man-made environments -- we introduce PLANA3R, a pose-free framework for metric 3D reconstruction from unposed two-view images. Our approach employs Vision Transformers to extract a set of sparse planar primitives, estimate relative camera poses, and supervise geometry learning via planar splatting, where gradients are propagated through high-resolution rendered depth and normal maps of primitives. Unlike prior feedforward methods that require 3D plane annotations during training, PLANA3R learns planar 3D structures without explicit plane supervision, enabling scalable training on large-scale stereo datasets using only depth and normal annotations. We validate PLANA3R on multiple indoor-scene datasets with metric supervision and demonstrate strong generalization to out-of-domain indoor environments across diverse tasks under metric evaluation protocols, including 3D surface reconstruction, depth estimation, and relative pose estimation. Furthermore, by formulating with planar 3D representation, our method emerges with the ability for accurate plane segmentation.
Overview of our PLANA3R. Given two images captured from the same scene, PLANA3R outputs a set of 3D planar primitives and 6-DoF relative camera pose $P_{\text{rel}}$ in metric scale. PLANA3R does not perform per-pixel primitive prediction. Instead, it employs a deconvolution network to predict primitives at two distinct resolutions, based on the patch divisions from the ViT encoder.
Explore detailed reconstruction results for different scenes. For each scene, browse the input images and segmentation outputs (which vary by version), and compare 3D reconstruction models (RGB, Segmentation, Raw Primitives) between v1 and v2. Click the version buttons to update all visualizations.
Explore all our PLANA3R reconstruction results in an interactive 3D gallery. Browse through different scenes and their variations (RGB, Segmentation, and Raw Primitives visualizations). You can rotate, zoom, and download individual models.
The gallery displays 30 different 3D models organized by scene, with search and filtering capabilities.
@inproceedings{
liu2025plana3r,
title={PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-forward Planar Splatting},
author={Changkun Liu and Bin Tan and Zeran Ke and Shangzhan Zhang and Jiachen Liu and Ming Qian and Nan Xue and Yujun Shen and Tristan Braud},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025}
}
This work was supported by Ant Group Research Intern Program and Ant Group Postdoctoral Program.