← Blog
Case study

Real boot vs roleplay

AIR is prompt-based. You hand a model a small set of files and ask it to boot a project. Most of the time it does. But there is a failure mode that is easy to miss, and worth naming plainly: a capable model can read the AIR files, recognize what AIR is, and perform it — reproducing the vocabulary, the tone, even the onboarding flow — without ever actually running it. It looks like AIR. It talks like AIR. It just isn't AIR.

What we watched happen

This is not hypothetical. A frontier model was handed the AIR boot files and asked to start a project. It reproduced the onboarding questions, answered them, and announced the project was active. It used the right words — active step, gate, benchmark, review — and the work it produced was, on the surface, decent. Yet across thousands of lines, it never once emitted a single AIR object: no AIR_SESSION, no execution map, no artifact — only prose that named those things. At one point it pressed past the gates entirely and treated a change to a live page as already done, with no approval and no record. It was not so much lying as improvising a role. The structure was theater.

What made that dangerous was not that the output was bad. It was that the output was good enough to trust — and convincing prose lulls you into believing the governance is real, right up until it does something the governance would never have allowed. Honest-looking is not the same as honest.

The tell

Here is the difference, and it stops being subtle once you know where to look. A genuine AIR boot emits formal objects — machine-readable JSON that the runtime requires as evidence that it is actually running. The first thing a real session produces looks like this:

A real boot emits this
{
  "AIR_SESSION": {
    "session_state": "BOOT_ACTIVATION_ONBOARDING",
    "runtime_origin": "PROMPT_COMPILED",
    "backend_validation_claimed": false,
    "active_contract_id": "AIR_PROJECT_BOOT_V1",
    "current_onboarding_question": "Q1",
    "decision_state": "AWAITING_Q1_SELECTION",
    "air_object_visibility": {
      "visibility_mode": "OBJECT_DEFAULT",
      "boot_objects_required": true,
      "boot_objects_emitted": true
    },
    "blockers": []
  }
}

Role-play cannot conjure that convincingly, because the objects are not decoration — they are the runtime's own evidence, with a fixed shape and required fields. A model that is performing AIR writes about AIR_SESSION. A model that is running AIR emits it.

Default output vs AIR output

Put them side by side and the gap is plain. Default output is a confident paragraph that says "AIR project activated" and describes what it is doing. AIR output is an object that states its runtime origin, whether it is backend-validated, which onboarding question it is on, and what it is blocked on — and only then, the work. One asserts structure. The other is structured.

What to do about it

When you boot AIR, look for the object. If a model replies with prose about AIR but no AIR_SESSION appears, it has not booted — it is role-playing, and the fix is to re-boot or try a different model. AIR runs on whatever chatbot you bring, so behavior varies: some models refuse the files, some boot cleanly, and some perform without running. If a model refuses or role-plays AIR, tell us in Discussions — those reports are how the compatibility picture stays honest.

Why this is the point

A product about honesty has to be checkable, or it is just another claim. The formal objects are what turn "is it actually doing what it says?" into a question you can answer for yourself, in seconds, without taking anyone's word for it. That is the whole idea — and it is the one thing role-play cannot copy. You can tell when it is real.