OmniMap: A Comprehensive Mapping Framework Integrating Optics, Geometry, and Semantics

Poster

teaser

OmniMap, A Comprehensive Mapping Framework Integrating Optics, Geometry, and Semantics.

Abstract

Robotic systems demand accurate and comprehensive 3D environment perception, requiring simultaneous capture of a comprehensive representation of photo-realistic appearance (optical), precise layout shape (geometric), and open-vocabulary scene understanding (semantic). Existing methods typically achieve only partial fulfillment of these requirements while exhibiting optical blurring, geometric irregularities, and semantic ambiguities. To address these challenges, we propose OmniMap. Overall, OmniMap represents the first online mapping framework that simultaneously captures optical, geometric, and semantic scene attributes while maintaining real-time performance and model compactness. At the architectural level, OmniMap employs a tightly coupled 3DGS–Voxel hybrid representation that combines fine-grained modeling with structural stability. At the implementation level, OmniMap identifies key challenges across different modalities and introduces several innovations: adaptive camera modeling for motion blur and exposure compensation, hybrid incremental representation with normal constraints, and probabilistic fusion for robust instance-level understanding. Extensive experiments show OmniMap’s superior performance in rendering fidelity, geometric accuracy, and zero-shot semantic segmentation compared to state-of-the-art methods across diverse scenes. The framework’s versatility is further evidenced through a variety of downstream applications including multi-domain scene Q&A, interactive edition, perception-guided manipulation, and map-assisted navigation.

Framework

The 2D Language Embeddings Extractor sequentially combines multiple open-set models for detection, segmentation, captioning, and embedding extraction of objects. The Probabilistic Voxel Reconstructor incrementally integrates per-frame instance masks and embeddings into 3D space while maintaining voxel-aligned probabilistic instance tuples and a global embedding codebook. The Motion-Robust 3DGS Incremental Reconstructor initializes new Gaussians from new assigned voxels and renders 2D images under RGB, depth, and normal modalities with parameterized camera models for supervision.

teaser

Benchmark Results

Tips: please refresh the page if the display is incomplete

Optics: Image Rendering

Visualisation of scene obtaind by different methods.



RTG-SLAM Gaussian-SLAM MonoGS Ours
RGB
SplaTAM GSFusion GS-ICP-SLAM Ours
RGB
CaRtGS HI-SLAM2 RTG-SLAM Ours
RGB
Gaussian-SLAM MonoGS SplaTAM Ours
RGB

Rendering results

GT RGB


Geometry: Mesh Reconstruction

Visualisation of scene obtaind by different methods.



RGB
RTG-SLAM Gaussian-SLAM MonoGS Ours
Normal
RGB
SplaTAM GSFusion GS-ICP-SLAM Ours
Normal
RGB
CaRtGS RTG-SLAM Gaussian-SLAM Ours
Normal
RGB
MonoGS SplaTAM GSFusion Ours
Normal

GT RGB

Mesh reconstruction results

GT Normal


Semantics: Zero-shot Segmentation

Visualisation of scene obtaind by different methods.



RGB
OpenGaussian LangSplat GraspSplats Ours
Semantic
RGB
ConceptFusion ConceptGraphs OpenFusion Ours
Semantic
RGB
HOV-SG OpenGaussian LangSplat Ours
Semantic
RGB
GraspSplats ConceptFusion ConceptGraphs Ours
Semantic

GT RGB

Zero-shot semantic segmentation results

GT Semantic


Online Mapping

Tips: please click to view the video

Online Mapping room0 Placeholder

Replica-room_0

Online Mapping scene0106_00 Placeholder

ScanNet-scene0106_00



Applications

Scene Q&A

2D Scene
3D Scene


Interactive Editing

2D Scene
3D Scene


In-place Manipulation

Tips: please click to view the video

In-place Manipulation Placeholder

OmniMap supports robotic arm operations by reconstructing workspace optics, geometry, and semantics.


Mobile Navigation

Tips: please click to view the video

Mobile Navigation Placeholder

OmniMap enables mobile robot navigation experiments.