Zero-shot depth completion has gained attention for its ability to generalize across environments without sensor-specific datasets or retraining. However, most existing approaches rely on diffusion-based test-time optimization, which is computationally expensive due to iterative denoising. Recent visual-prompt-based methods reduce training cost but still require repeated forward--backward passes through the full frozen network to optimize input-level prompts, resulting in slow inference. In this work, we show that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace. Based on this insight, we propose a lightweight test-time adaptation method that updates only this low-dimensional subspace using sparse depth supervision. Our approach achieves state-of-the-art performance, establishing a new Pareto frontier between accuracy and efficiency for test-time adaptation. Extensive experiments on five indoor and outdoor datasets demonstrate consistent improvements over prior methods, highlighting the practicality of fast zero-shot depth completion.
Top: error maps w.r.t. ground truth. Bottom: predicted depth maps.
Blue dashed boxes highlight representative regions.
Our method establishes a new Pareto frontier, achieving the lowest error with highly efficient inference among zero-shot depth completion methods. Compared to TestPromptDC, it reduces MAE by 22.2% and RMSE by 19.6% on average across five datasets, achieving the best performance on most benchmarks.
(a) Training-based depth completion requires offline training with paired RGB–depth data. (b) Test-time optimization methods adapt latent variables or visual prompts at inference time, incurring high computational cost. (c) Our method adapts only the decoder low-dimensional subspace, enabling efficient test-time adaptation.
Motivation. (a) Correlation with the final depth output is low in the encoder but increases sharply in the decoder. (b) PCA (PC1) shows decoder features align with the final depth map, indicating a low-dimensional depth subspace.
Efficiency–accuracy trade-off. Our method achieves a favorable speed–accuracy trade-off with minimal trainable parameters and fast adaptation.
Optimization Progress over Iterations. Top: error maps. Bottom: predicted depth maps.
@article{seo2026efficient,
title={Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation},
author={Seo, Minseok and Lee, Wonjun and Jang, Jaehyuk and Kim, Changick},
journal={arXiv preprint arXiv:2603.01765},
year={2026},
}