Structuring unstructured data is a correct use of AI
The Prime Minister recently announced Extract, a project by colleagues at i.AI.
Very briefly, Extract uses Large Language Models and segmentation models like Segment Anything to convert untidy PDFs of planning regulations into sparkling GeoJSON-plus-metadata, ready to be fed into digital planning systems.
Not only does Extract address real problems with the inefficiency of the English planning system, but I think it’s an ideal AI product, for three reasons.
All killer
Firstly, Extract can finish. It takes untidy data and it gives it structure, and once the data is structured the outputs can be verified. If necessary, they can be corrected. You’re left with an asset: your data, suitable for building upon.
Secondly, Extract’s output has ready consumers: a rich existing ecosystem of planning data and planning tools. PlanX, the digital planning software that accepts Extract’s data, would not exist without many years of investment by civil servants and others to better structure planning data and digitse the planning process. This means Extract has no need to define its own output schemas or create a market for the outputs.
Thirdly, Extract makes no decisions, though its outputs may be used as inputs to decision-making tools. This makes Extract much more adoptable than tools that do make decisions.
No filler
Structuring unstructured data might be the biggest opportunity space for AI. It produces tangible assets. It is adoptable. It is verifiable. And it is the sort of drudge work that people don’t have time for.
If you want to make a useful, adoptable AI application, consider whether you can make it have these qualities:
- it can finish
- its outputs have ready consumers
- it does not make decisions