Mechanical Survival

Prompts are design documents

AI code generation is causing us to climb the ladder of abstraction beyond code, and if current trends continue, code itself is on its way to being little more than a disposable artefact—the output of the function “prompts × model”.

When this is the case, prompts are the entire specification, and the last design documents we have left.

We should record them!

A window on the pipeline

Take an example LLM-driven application like i.AI’s Consult, specifically the themefinder package that powers it.

themefinder uses LLM calls to read responses to a consultation question (e.g.10,000 people replying to “shall we tax sugary foods?”), sets up “themes” (“personal responsibility”, “cost to the NHS” etc), and assigns them to responses. (We hope this will save government many millions of pounds a year in time and consultancy fees).

This process has six steps, each of which is a prompt. I’ve linked the code there, but briefly they are:

  1. Sentiment analysis of responses
  2. Initial theme generation (in batches)
  3. Theme condensation (combining similar themes from across batches)
  4. Theme refinement
  5. Theme target alignment (reducing theme count)
  6. Mapping responses to refined themes

Feel free to read the prompts themselves!

The development process for such a pipeline looks like this. First you write some plumbing code to hook up the inputs to the outputs. And then, starting with a coarse end-to-end step (“apply themes to consultation”), you gradually break down that step into more steps.

This feels so natural that maybe it’s not worth talking about, but I think it’s important: you break things down because literally the only way to get accountability or control into this LLM-driven system is to put seams in by breaking up the steps.

All the design for the capabilities of an AI pipeline product like Consult is in the shape of the steps and the intermediate representations. In concert with the model, these step definitions determine the app’s behaviour, the ways it’s accountable, and the shape of its outputs.

They are the design.

Whatever’s deliberate

Code generation is similar.

The pipeline, much longer this time and more rambling, starts with an empty directory and produces your app. At each step you review the output and you return that output to the machine with a fresh spec. Just as with an app like Consult, the size of the step you specify, and the specification itself, make up the whole of your design.

The direction of travel for the tools is clear: the steps are getting bigger. That is, more output for fewer, more critical prompts. As the author of this very good essay points out, “AI compresses months of development into days. Which means you need to compress months of architectural decisions into hours”.

The prompts tell us where the ideas came from and where the model needed steering. They signal what we care about and why. They can’t be machine-generated. (The prompts can be “improved”, but even parallelized agents you leave to run for ages need a prime mover).

The best parallel I can think of for the role prompts play in this system is Architectural Decision Records. ADRs are where the real world meets abstract technical decision-making, a holdout for “why” in a system too complicated to hold in your head.

Infinitely portable

Prompts inevitably mix the technical (“move this class”) with the behavioural (“let users log in”), but as steps continue to get longer, I think it’s reasonable to expect we’ll tend to see more of the latter and less of the former.

Interesting things happen then. For instance, porting software between languages and platforms becomes trivial: run the same prompts in a different context.

That opens up strange possibilities, because “context” is broader than just the programming language. Different operating systems have different UI conventions, but more than that, teams differ. Domains differ. Countries differ!

And in the future we’re supposedly heading for, where the web melts away into just-in-time user interfaces built on the fly by LLMs, your “app” is itself part of a pipeline. You’re programming towards an abstraction which is out of reach for your app alone.

Maybe that future will be excessively weird and floppy, and never take hold. But even if it never does, prompts are already the founding texts of ferociously complex codebases.

Shall we normalise, then, storing the full text of prompts and other model exchanges in some agreed schema in git commit messages? A git hook could do it, and commit messages are spacious enough.

#Generative-Ai #Vibe-Coding #Programming