New Benchmark ‘Theory of Space’ Reveals Exploration Bottlenecks and Modality Gaps in Leading Foundation Models

Stanford AI Lab has introduced ‘Theory of Space,’ a new benchmark designed to test if foundation models can construct, revise, and exploit spatial beliefs through active exploration. The evaluation of six state-of-the-art models, including GPT-4V, Gemini Pro, and LLaVA-1.5, revealed critical limitations such as an ‘exploration bottleneck,’ a persistent ‘text-vision modality gap,’ and severe ‘belief inconsistencies’ in their spatial understanding.

Researchers found that while these models can grasp spatial relationships when information is directly presented, they face significant challenges in autonomously exploring environments and learning through trial and error. This study underscores the necessity of enhancing spatial reasoning capabilities for AI to function more autonomously and effectively in real-world scenarios.

Deeptime News Beta

New Benchmark 'Theory of Space' Reveals Exploration Bottlenecks and Modality Gaps in Leading Foundation Models

New Benchmark ‘Theory of Space’ Reveals Exploration Bottlenecks and Modality Gaps in Leading Foundation Models