PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting

Changkun Liu1, 2, *,     Bin Tan2, *,     Zeran Ke2, 3,     Shangzhan Zhang2, 4,     Jiachen Liu5,
Ming Qian2, 3     Nan Xue2, †,     Yujun Shen2,     Tristan Braud1
1The Hong Kong University of Science and Technology, 2 Ant Group, 3 Wuhan University, 4 Zhejiang University, 5 The Pennsylvania State University
* Equal Contribution Corresponding Author

NeurIPS 2025

Abstract

This paper addresses metric 3D reconstruction of indoor scenes by exploiting their inherent geometric regularities with compact representations. Using planar 3D primitives -- a well-suited representation for man-made environments -- we introduce PLANA3R, a pose-free framework for metric 3D reconstruction from unposed two-view images. Our approach employs Vision Transformers to extract a set of sparse planar primitives, estimate relative camera poses, and supervise geometry learning via planar splatting, where gradients are propagated through high-resolution rendered depth and normal maps of primitives. Unlike prior feedforward methods that require 3D plane annotations during training, PLANA3R learns planar 3D structures without explicit plane supervision, enabling scalable training on large-scale stereo datasets using only depth and normal annotations. We validate PLANA3R on multiple indoor-scene datasets with metric supervision and demonstrate strong generalization to out-of-domain indoor environments across diverse tasks under metric evaluation protocols, including 3D surface reconstruction, depth estimation, and relative pose estimation. Furthermore, by formulating with planar 3D representation, our method emerges with the ability for accurate plane segmentation.


Method

Overview of our PLANA3R. Given two images captured from the same scene, PLANA3R outputs a set of 3D planar primitives and 6-DoF relative camera pose $P_{\text{rel}}$ in metric scale. PLANA3R does not perform per-pixel primitive prediction. Instead, it employs a deconvolution network to predict primitives at two distinct resolutions, based on the patch divisions from the ViT encoder.

Architecture


Results Analysis & Detailed Visualization

Explore detailed reconstruction results for different scenes. For each scene, browse the input images and segmentation outputs (which vary by version), and compare 3D reconstruction models (RGB, Segmentation, Raw Primitives) between v1 and v2. Click the version buttons to update all visualizations.

📷 Scene 1
Select Version:
Input View 0
View 0
Input View 1
View 1
Segmentation View 0 (v1)
Seg 0
Segmentation View 1 (v1)
Seg 1
Select 3D Model Type:
RGB Reconstruction (v1)
RGB Reconstruction (v2)
Segmentation (v1)
Segmentation (v2)
Raw Primitives (v1)
Raw Primitives (v2)
📷 Scene 2
Select Version:
Input View 0
View 0
Input View 1
View 1
Segmentation View 0 (v1)
Seg 0
Segmentation View 1 (v1)
Seg 1
Select 3D Model Type:
RGB Reconstruction (v1)
RGB Reconstruction (v2)
Segmentation (v1)
Segmentation (v2)
Raw Primitives (v1)
Raw Primitives (v2)
📷 Scene 3
Select Version:
Input View 0
View 0
Input View 1
View 1
Segmentation View 0 (v2)
Seg 0
Segmentation View 1 (v2)
Seg 1
Select 3D Model Type:
RGB Reconstruction (v1)
RGB Reconstruction (v2)
Segmentation (v1)
Segmentation (v2)
Raw Primitives (v1)
Raw Primitives (v2)
📷 Scene 4
Select Version:
Input View 0
View 0
Input View 1
View 1
Segmentation View 0 (v2)
Seg 0
Segmentation View 1 (v2)
Seg 1
Select 3D Model Type:
RGB Reconstruction (v1)
RGB Reconstruction (v2)
Segmentation (v1)
Segmentation (v2)
Raw Primitives (v1)
Raw Primitives (v2)
📷 Scene 5
Select Version:
Input View 0
View 0
Input View 1
View 1
Segmentation View 0 (v1)
Seg 0
Segmentation View 1 (v1)
Seg 1
Select 3D Model Type:
RGB Reconstruction (v1)
RGB Reconstruction (v2)
Segmentation (v1)
Segmentation (v2)
Raw Primitives (v1)
Raw Primitives (v2)
📷 Scene 6
Select Version:
Input View 0
View 0
Input View 1
View 1
Segmentation View 0 (v1)
Seg 0
Segmentation View 1 (v1)
Seg 1
Select 3D Model Type:
RGB Reconstruction (v1)
RGB Reconstruction (v2)
Segmentation (v1)
Segmentation (v2)
Raw Primitives (v1)
Raw Primitives (v2)
📷 Scene 7
Select Version:
Input View 0
View 0
Input View 1
View 1
Segmentation View 0 (v1)
Seg 0
Segmentation View 1 (v1)
Seg 1
Select 3D Model Type:
RGB Reconstruction (v1)
RGB Reconstruction (v2)
Segmentation (v1)
Segmentation (v2)
Raw Primitives (v1)
Raw Primitives (v2)

BibTeX

@inproceedings{
liu2025plana3r,
title={PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-forward Planar Splatting},
author={Changkun Liu and Bin Tan and Zeran Ke and Shangzhan Zhang and Jiachen Liu and Ming Qian and Nan Xue and Yujun Shen and Tristan Braud},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025}
}

Acknowledgements

This work was supported by Ant Group Research Intern Program and Ant Group Postdoctoral Program.