The reachability frontier
AJoAI's design rests on an empirical question: what undiscovered science can today's models actually reach? A multi-month program — two "discovery engines," itself largely machine-run — gives a consistent, controlled answer. What follows are results about specific mechanisms, not philosophy.
Predictability and novelty pull apart
We built one engine that forecasts which ideas a field will combine next, and one that generates new methods by analogy. Across both, under adversarial controls, the same pattern held in five independent settings: every reliable signal measures the obvious. The science a model predicts or generates with confidence is the science already implicit in what is known — and genuine novelty is precisely the part the machinery does not reach.
This isn't a counsel of despair. It's a map — of what machine coauthorship is good for today, and what has to be true for it to do more.
Three zones of reachability
Recombining existing ideas, forecasting where a field will densify, re-deriving known methods under new framings. Models are genuinely good here — and this is most of what "AI discovery" is today.
A weak, popularity-independent pull toward surprising recombinations. Real, but not something you can confidently bet on — the non-obvious isn't random, but it isn't predictable enough to act on.
Confidently naming specific surprises in advance; discovering genuinely unnamed methods. Repeatedly near zero under controls. *for the mechanisms we tested.
What we actually measured
From a ~100k-paper citation graph, predict which concept pairs get co-cited next. Forward in time it works well for near-future combinations — but it forecasts densification, not surprises. A smoother "reachability landscape" did carry a real signal for genuine surprises across a pre-1930 historical firewall — yet when asked to name the surprises it foresaw, plain popularity beat it. A strong aggregate metric hid an unusable forecast; only an operational test caught it.
A strong language model proposes new methods by transferring mechanisms across fields. A blind judge first called most of them novel. Under strict scrutiny — restate without the metaphor, name the closest published method, check the literature — every one collapsed to an established, named technique. For these methods, on this sample, the genuine-discovery rate was zero. Even a strong generator decorates known methods with fresh analogies.
Judge an idea's novelty by how rarely its components co-occur in the real literature, not by a model's opinion. On known examples it cleanly separates cliché from genuine novelty where model self-assessment fails — directly usable as an editorial screen.
Why the journal is built this way
The novelty bar is everything
A strong model over-called novelty — and only de-analogizing plus literature checking exposed it. A flood of "novel analogy for an already-solved problem" is the predictable failure mode of machine coauthorship, and human reviewers are not immune to a well-dressed analogy. So the bar has to be grounded in the literature, not in impression.
Reproducibility is the grounding
A model crosses beyond re-derivation only when its signal is grounded in a check it cannot fabricate; ungrounded loops learn a convincing shortcut instead. AJoAI's mandate — code, data, prompts, human verification — is that fabrication-resistant grounding. Enforced, it's where machine contribution can become real discovery; unenforced, it quietly degrades into confident re-derivation.
Machines reliably reach the reachable-and-obvious. Genuine novelty is the residual — and that residual is exactly what grounding makes accessible.
Reproducibility isn't bureaucracy here. It's the only thing that lets a machine's contribution be discovery rather than disguised recall.
Empirical, on real literature and real models, with adversarial controls — but bounded to the mechanisms tested (co-citation link-prediction; language-model analogical generation up to 32B parameters). Reported figures are specific to those methods and datasets. This is a frontier mapped with current tools, and the condition for moving it — not a theorem about all possible systems.