[ foundations ]

The reachability frontier

AJoAI's design rests on an empirical question: what undiscovered science can today's models actually reach? A multi-month program — two "discovery engines," itself largely machine-run — gives a consistent, controlled answer. What follows are results about specific mechanisms, not philosophy.

// the core finding

Predictability and novelty pull apart

We built one engine that forecasts which ideas a field will combine next, and one that generates new methods by analogy. Across both, under adversarial controls, the same pattern held in five independent settings: every reliable signal measures the obvious. The science a model predicts or generates with confidence is the science already implicit in what is known — and genuine novelty is precisely the part the machinery does not reach.

Each point is an idea a model can reach, placed by how predictable it is (across) and how novel it is (up). The cloud hugs a frontier: the reachable-and-obvious corner is crowded, genuine novelty is sparse, and the predictable-and-novel corner never fills — you cannot confidently forecast the genuinely new.

This isn't a counsel of despair. It's a map — of what machine coauthorship is good for today, and what has to be true for it to do more.

// the map

Three zones of reachability

reachable, reliably reachable, faintly out of reach* reachability density
reachable, reliably

Recombining existing ideas, forecasting where a field will densify, re-deriving known methods under new framings. Models are genuinely good here — and this is most of what "AI discovery" is today.

reachable, faintly

A weak, popularity-independent pull toward surprising recombinations. Real, but not something you can confidently bet on — the non-obvious isn't random, but it isn't predictable enough to act on.

out of reach*

Confidently naming specific surprises in advance; discovering genuinely unnamed methods. Repeatedly near zero under controls. *for the mechanisms we tested.

// the evidence

What we actually measured

engine A — recombination forecasting

From a ~100k-paper citation graph, predict which concept pairs get co-cited next. Forward in time it works well for near-future combinations — but it forecasts densification, not surprises. A smoother "reachability landscape" did carry a real signal for genuine surprises across a pre-1930 historical firewall — yet when asked to name the surprises it foresaw, plain popularity beat it. A strong aggregate metric hid an unusable forecast; only an operational test caught it.

engine B — analogical generation

A strong language model proposes new methods by transferring mechanisms across fields. A blind judge first called most of them novel. Under strict scrutiny — restate without the metaphor, name the closest published method, check the literature — every one collapsed to an established, named technique. For these methods, on this sample, the genuine-discovery rate was zero. Even a strong generator decorates known methods with fresh analogies.

62% 0%* looked novel genuine under scrutiny
*genuine, unnamed-method discovery — for these methods, on this sample
a tool that fell out — a literature-grounded novelty filter

Judge an idea's novelty by how rarely its components co-occur in the real literature, not by a model's opinion. On known examples it cleanly separates cliché from genuine novelty where model self-assessment fails — directly usable as an editorial screen.

// what it means for AJoAI

Why the journal is built this way

The novelty bar is everything

A strong model over-called novelty — and only de-analogizing plus literature checking exposed it. A flood of "novel analogy for an already-solved problem" is the predictable failure mode of machine coauthorship, and human reviewers are not immune to a well-dressed analogy. So the bar has to be grounded in the literature, not in impression.

Reproducibility is the grounding

A model crosses beyond re-derivation only when its signal is grounded in a check it cannot fabricate; ungrounded loops learn a convincing shortcut instead. AJoAI's mandate — code, data, prompts, human verification — is that fabrication-resistant grounding. Enforced, it's where machine contribution can become real discovery; unenforced, it quietly degrades into confident re-derivation.

Machines reliably reach the reachable-and-obvious. Genuine novelty is the residual — and that residual is exactly what grounding makes accessible.

Reproducibility isn't bureaucracy here. It's the only thing that lets a machine's contribution be discovery rather than disguised recall.

Empirical, on real literature and real models, with adversarial controls — but bounded to the mechanisms tested (co-citation link-prediction; language-model analogical generation up to 32B parameters). Reported figures are specific to those methods and datasets. This is a frontier mapped with current tools, and the condition for moving it — not a theorem about all possible systems.