Research

Publications and preprints — quantitative methods, statistical validation, and the methodology of forecasting.

I work at the intersection of quantitative finance, statistics, and machine learning. My research interest is validation methodology — the discipline of separating a real result from an artifact of search, and importing the tools one field paid for in hard lessons into another that has not yet learned them. Every result below ships with reproducible code and explicit epistemic labeling.

Publications (2)

[01]

Published2026·Zenodo · archived research artifact

An Executable-Gate Multi-Agent Research Organization: Artifact, Case Study, and a Pre-Registered Gate-Calibration Study

Kacper Saks

Multi-agent systems are increasingly proposed as autonomous research organizations, yet their quality-control mechanisms are rarely measured. This work releases a domain-agnostic, 39-role multi-agent research organization in which every role output must pass an executable gate and adversarial sign-off — nothing self-certifies — together with two empirical records of its behavior. First, a limitations-forward case study of the initial end-to-end run (v0.1.0), in which the organization abandoned all five candidate research directions and retracted its own flagship integrity exhibit after it failed its verifier. Second, a pre-registered, blinded, seeded gate-calibration study (v0.2.0) answering the question the first run left open: do the gates discriminate between sound and flawed research, or do they uniformly reject? On a corpus of 20 theses (10 known-flawed with planted defects, 10 known-sound reproductions, SHA-256 sealed labels), the gates detected 15 of 15 planted flaws and cited the correct reason in 13–14 of 15 cases. An ablation isolated the round-1 auto-fail prior as the dominant false-positive source: it reduced pooled validity-gate specificity from 0.96 [0.86, 1.00] to 0.72 [0.58, 0.84] and killed 8 of 10 known-sound theses with no sensitivity benefit. The release includes a Citation Fidelity Protocol with SHA-256-pinned verbatim anchors, a full reproducibility harness (make reproduce / make verify), and seven named limitations stated in full.

Read online →Code →Abstract & details →

multi-agentresearch-automationevaluationpre-registrationcalibrationreproducibility

[02]

Preprint2026·Self-archived preprint

The Validation Crisis in AGI Capability Forecasting

Kacper Saks

Forecasts of when artificial general intelligence will arrive increasingly shape capital allocation, regulation, and where a generation of talent is placed — yet the confidence attached to them exceeds what the methods producing them can support. This paper argues the gap is structural: the predictable consequence of fitting a model to a measured window and projecting it forward without the validation discipline other quantitative fields require. We import that discipline from quantitative finance — the deflated Sharpe ratio, the probability of backtest overfitting, and a walk-forward retrodiction protocol — and introduce the Deflated Capability Forecast (DCF), a method that widens a forecast's stated interval by the amount its underlying methodology warrants, returning a distribution with explicit treatment of the tails in place of a point estimate carrying unearned precision. Across the forecasts where the method could be fully computed, deflation factors cluster between 1.3× and 2.0× — the stated intervals are systematically too narrow. We then turn the method on this work itself: a preregistered prediction that one landmark forecast's interval would widen by at least 2.3× produced 1.285×. We report the failure rather than revise the threshold — a discipline of honest validation is supposed to surface exactly this.

Read online →PDF ↓Abstract & details →

quantitative-financestatisticsai-forecastingvalidationmethodology