Modifying Point Clouds with LLMs

Learn how to build an agent that modifies 3D point clouds using natural language. Discover techniques for LLM spatial reasoning, structured scene graphs, and systematic evaluation for reliable performance.

Overview

I’m presenting an agent and natural-language interface that understands and modifies 3D point clouds. Users can ask it to complete various tasks - identifying objects, segmenting regions, even moving elements - and it executes them directly on the point cloud data.

The core challenge is getting an LLM to reason meaningfully over spatial 3D structures, which requires careful prompt engineering, tooling, and a structured scene graph as the agent’s “world model” rather than raw point data. Getting reliable behavior meant running systematic evals: testing the agent across varied phrasings, ambiguous spatial queries, and edge cases like occluded or overlapping objects - then iterating on the tool definitions and system prompt until performance was consistent.

The builder takeaway: treat your data representation as a first-class design decision - the right abstraction layer between the LLM and your domain data is what makes or breaks agent reliability.

Links

Tech stack