Clinical skills11 min read·

UKMLA Statistics & Evidence-Based Medicine: High-Yield Guide

The complete high-yield EBM set for the AKT — sensitivity and specificity, PPV/NPV, RR vs OR, ARR/RRR, NNT, confidence intervals, the study-design hierarchy and bias — with the exam trap for each.

Statistics and evidence-based medicine are where a lot of clinically strong candidates quietly drop marks. The maths feels alien next to a cardiology stem, so it gets left until last and then skipped. That's a mistake: EBM is an explicit part of the content map, the concepts are finite and predictable, and a few focused hours here is some of the highest-yield revision you can do — once you know the formulae, the questions are formulaic.

Here's the whole high-yield set, with the definitions and the exam traps.

1. Diagnostic test statistics

Build everything from a 2×2 table (disease present/absent vs test positive/negative):

  • Sensitivity = TP / (TP + FN) — of those with the disease, how many test positive. A highly Sensitive test, when Negative, rules outSnNout.
  • Specificity = TN / (TN + FP) — of those without the disease, how many test negative. A highly Specific test, when Positive, rules inSpPin.
  • Positive predictive value (PPV) = TP / (TP + FP) — of those testing positive, how many truly have it.
  • Negative predictive value (NPV) = TN / (TN + FN).

The trap: sensitivity and specificity are properties of the test and don't change with prevalence; PPV and NPV do depend on prevalence. The same test has a lower PPV in a low-prevalence population. Likelihood ratios (LR+ = sensitivity / (1 − specificity)) are the prevalence-independent way to express the same idea.

2. Measures of effect

  • Absolute risk reduction (ARR) = control event rate − treatment event rate.
  • Relative risk reduction (RRR) = ARR / control event rate.
  • Number needed to treat (NNT) = 1 / ARR (with ARR as a proportion). NNH is the same for harm.
  • Relative risk (RR) = risk in exposed / risk in unexposed (cohorts, RCTs). Odds ratio (OR) = used in case-control studies.

The trap: relative figures look dramatic and hide the baseline. A drug that cuts risk from 2% to 1% is a 50% RRR but only a 1% ARR — an NNT of 100. Exam stems lean on this gap on purpose.

3. Study designs and the hierarchy of evidence

From strongest to weakest for questions of treatment effect: systematic reviews / meta-analyses of RCTs → RCTs → cohort studies → case-control studies → cross-sectional studies → case series/reports → expert opinion. Match the design to the question: RCTs for treatment effect, cohort for prognosis/incidence, case-control for rare diseases or outbreaks, cross-sectional for prevalence.

4. Bias, confounding and validity

  • Selection bias, recall bias (notorious in case-control), confounding (a third factor linked to both exposure and outcome).
  • Intention-to-treat analysis — analyse participants in their assigned group regardless of what they actually received; preserves randomisation and is the more conservative, exam-preferred approach.
  • Blinding reduces observer/performance bias.
  • Internal validity (is the result true for this study?) vs external validity (does it generalise?).

5. Interpreting results

  • p-value — the probability of a result this extreme if the null hypothesis were true; < 0.05 is the conventional threshold for "statistically significant".
  • Confidence interval (CI) — the plausible range for the true value. For a ratio (RR, OR), a 95% CI that crosses 1 is non-significant. For a difference (ARR, mean difference), a CI that crosses 0 is non-significant.
  • Statistical ≠ clinical significance — a huge trial can find a tiny, statistically significant effect that means nothing at the bedside.

6. Screening

Know the Wilson–Jungner criteria (the condition is an important health problem, there's a recognisable latent stage, a suitable and acceptable test, an accepted treatment, and screening is cost-effective). Watch for lead-time bias (screening appears to extend survival just by diagnosing earlier) and length-time bias (screening preferentially catches slower, more indolent disease).

7. How it's tested in the AKT

Three shapes recur:

  • Calculation — "from this 2×2 table, what is the NNT / sensitivity?"
  • Interpretation — "this 95% CI for the odds ratio is 0.8–1.4; what does it show?" (crosses 1 → not significant).
  • Design — "what is the best study type to answer this question?"

None require advanced maths — just the definitions above, applied cleanly.

8. Where MLA Prep fits

EBM rewards practice on worked examples more than reading, because the skill is applying the formula under time pressure. MLA Prep's SBAs include statistics and EBM stems with full worked explanations, and you can try them on two full topics freestart free. Pair this with clinical reasoning for the UKMLA, since EBM is really reasoning under uncertainty with numbers attached.

Frequently asked questions

Is statistics actually on the UKMLA? Yes — evidence-based practice is part of the content map and is reliably tested. It's high-yield because it's predictable.

What statistics do I need to know? Sensitivity/specificity, PPV/NPV, likelihood ratios, RR and OR, ARR/RRR, NNT/NNH, p-values and confidence intervals, the study-design hierarchy, and common biases.

How much time should I spend on EBM? A few focused hours. The concepts are finite, so the return per hour is among the highest in your revision.

What's the hardest part? Usually remembering that PPV/NPV depend on prevalence, and reading confidence intervals correctly (crosses 1 for ratios, crosses 0 for differences).

Further reading

Prep with a UKMLA-aligned Q-bank.

10,000+ SBAs, NICE-aligned explanations, 10,766 spaced-repetition flashcards, and unlimited 200-question mocks — built for UKMLA.