Predicting the outcomes of assisted reproductive technology treatments: a systematic review and quality assessment of prediction models

Review Article

VOLUME 2, ISSUE 1, P1-10, JANUARY 01, 2021


Ian Henderson, M.Sc., Michael P. Rimmer, M.Sc., Stephen D. Keay, M.D., Paul Sutcliffe, Ph.D., Khalid S. Khan, M.Sc., Ephia Yasmin, Ph.D., Bassel H. Al Wattar, Ph.D. 



Predicting the outcomes of assisted reproductive technology (ART) treatments is desirable, but adopting prediction models into clinical practice remains limited. We aimed to review available prediction models for ART treatments by conducting a systematic review of the literature to identify the best-performing models for their accuracy, generalizability, and applicability.

Evidence review

We searched electronic databases (MEDLINE, EMBASE, and CENTRAL) until June 2020. We included studies reporting on the development or evaluation of models predicting the reproductive outcomes before (pre-ART) or after (intra-ART) starting treatment in couples undergoing any ART treatment. We evaluated the models’ discrimination, calibration, type of validation, and any implementation tools for clinical practice.


We included 69 cohort studies reporting on 120 unique prediction models. Half of the studies reported on pre-ART (48%) and half on intra-ART (56%) prediction models. The commonest predictors used were maternal age (90%), tubal factor subfertility (50%), and embryo quality (60%). Only 14 models were externally validated (14/120, 12%), including 8 pre-ART models (Templeton, Nelson, LaMarca, McLernon, Arvis, and the Stolwijk A/I, C, II models) and 5 intra-ART models (Cai, Hunault, van Loendersloot, Meijerink, Stolwijk B, and the McLernon posttreatment model), with a reported c-statistic ranging from 0.50 to 0.78. Ten of these models provided implementation tools for clinical practice, with only 2 reporting online calculators.


We identified externally validated prediction models that could be used to advise couples undergoing ART treatments on their reproductive outcomes. The quality of the available models remains limited and more research is needed to improve their generalizability and applicability into clinical practice.