Nicholas Jiang's Research

The Unreasonable Effectiveness of Non‑Overlapping Failures in LLM Prover Ensembles

Improving the reliability of mathematical reasoning in large language models (LLMs) is critical for applications in education, automated theorem proving, and formal verification. This paper investigates whether the functional diversity of prover models, specifically their non-overlapping failures, can be harnessed through ensembling to improve collective performance. We present a theoretical risk decomposition framework for an OR-aggregated ensemble of theorem provers, demonstrating that the ensemble's risk is equal to the average individual risk minus an 'ambiguity effect' that quantifies the diversity of the provers. Our analysis formalizes the intuition that diversity, defined as non-overlapping failures, is strictly beneficial in this context. We hypothesize that such ensembles may not only surpass the accuracy of any individual model but could also potentially generate proofs for statements previously unprovable by any single prover. Furthermore, we aim to investigate whether these techniques can be applied to current state-of-the-art models to push performance on more difficult, unsaturated benchmarks such as PutnamBench.

Nicholas Jiang, Joe Zhou, Rishabh Sharma

Proposal

Diversity-Driven Generalization in Mathematical Reasoning Ensembles

We propose a framework for studying collaborative generalization in mathematical reasoning systems by training ensembles of solvers to learn from one another’s complete solutions in a fully self-supervised setting. Using the miniF2F benchmark, we construct ensembles with varying diversity, quantified via a novel Task2Vec-based Ensemble Diversity Coefficient (EDC). We fine-tune solvers on peer-generated proofs and evaluate generalization to held-out problems. We hypothesize that higher EDC predicts greater improvement, revealing diversity as a key factor in enabling ensemble-based peer learning.

Nicholas Jiang, Joe Zhou, Rishabh Sharma, Sarvesh Sivakumar

Proposal Video Presentation

Nicholas Jiang's Research Contributions

Home

Projects

CV

The Unreasonable Effectiveness of Non‑Overlapping Failures in LLM Prover Ensembles

Diversity-Driven Generalization in Mathematical Reasoning Ensembles