GPT-4 Vision for Large Zoning Codes

Learn how to analyze lengthy zoning codes by rendering them as images for LLMs, reducing costs and maintaining accuracy for large-scale data extraction.

Overview

Cloud hosted LLMs (such as GPT4 by OpenAI) charge per-token. Depending on the length of your input, this can become expensive, if not impossible. When building RealEstatePulse, I came across the challenge of needing to extract structured data from city zoning codes that were hundreds of pages long – well passed the capabilities of even the largest LLMs. Moreover, there was a heavy dependence on textual layout (for example, with tables). The extra HTML markup blew up the token count and made analyzing zoning codes very expensive.

However, recent research in LLMs have shown that the world model they acquire from textual learning translates well into visual question answering. Moreover, there is good reason to believe that current models are highly redundant, meaning they can theoretically process a lot more data than what the token embedding layer produces.

Some cloud LLMs, such as GPT4, offer the option of visual input. Unlike text input, visual input is charged at a flat fee depending on the size of the image. I would like to present the technique I developed for analyzing extremely large zoning codes by rendering the text as an image and using this to prompt GPT. I will show that this makes the problem tractable and produces good results.

Links

https://realestatepulse.email
Executes targeted, segmented real estate email campaigns providing custom analytics dashboards.

Tech stack