Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
Published in ICLR, 2025
This work evaluates and improves the 3D awareness of Vision Transformer (ViT)-based models, showing that enhancing 3D equivariance in their semantic embeddings leads to better performance in tasks like pose estimation and tracking. The authors propose a simple finetuning strategy based on 3D correspondences, demonstrating substantial improvements with minimal finetuning on a single object.
Recommended citation: You, Y., Li, Y., Deng, C., Wang, Y., & Guibas, L. (2024). Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning. arXiv preprint arXiv:2411.19458.