What actually raises your Step 2 score
Every popular study decision, measured against the real exam scores of 2,426 students from r/Step2 score reports. The answers are not what the folklore says.
The short version
Built from 2,426 real score reports, including 1,006 where the first and last practice tests could be dated. Two findings frame everything else:
- Scores move a lot. The average student gained +24 points between their first practice test and the real exam. Students starting at 200–224 gained +35.
- Resource choices barely move them. Which qbank, questions past ~2,500, CMS forms during dedicated, extra weeks: each measures roughly zero. The points come from the studying itself, plus two specific habits covered below.
How much students actually improve
Each line tracks the same students from their first practice test (taken 35+ days out) through their last to the real exam, n = 1,006 reports.
| Starting score | First test | Last test | Real exam | Total gain | Reports |
|---|---|---|---|---|---|
| 200–224 | 215 | 243 | 250 | +35 | 266 |
| 225–239 | 233 | 251 | 257 | +24 | 312 |
| 240–254 | 246 | 259 | 263 | +17 | 268 |
| 255+ | 262 | 267 | 268 | +6 | 118 |
Two things worth knowing about that average +24. First, +19 of it shows up on practice tests themselves and +5 is the final step up to the real exam, which tends to land above late practice tests. Second, it is not a quirk of mixing different tests: looking only at students whose first and last tests were both NBMEs, the gain is +18 points on the exact same scale. The improvement is real, it is large, and it is largest for the people furthest behind. A few points of the lowest group's gain are a rebound from an unlucky first test, but only a few.
What each choice is actually worth
Every study decision students argue about, measured on real scores. Bars show the midpoint of the measured effect:
| Choice | Worth | The evidence |
|---|---|---|
| Second pass of your weak material | Two-pass students beat one-pass students with the same first-pass accuracy by +5.0 points (first pass 55–64%) and +3.4 (65–74%). At 75%+ it adds nothing (0.0). n = 248 two-pass reports. | |
| Adding a second qbank | At the same UWorld accuracy (65–74%), students who added AMBOSS scored +3.9 higher. The gap shrinks to about zero at 75%+. Partly a real boost, partly just who chooses to do two banks. | |
| More practice exams | Comparing students with the same starting score and timeline, each added practice test brings up to a point of extra improvement; the 4-plus group beat the 3-or-fewer group by +0.7 points (n = 221). They also tell you where you stand, so they pay for themselves twice. | |
| More questions past ~2,500 | Total questions completed vs final score: r = 0.02 across 472 reports. Flat from 1,500 to 4,300+, and still flat when comparing students who started from the same score. | |
| CMS forms during dedicated | US MD/DO students who reported CMS averages improved +21.1 from first to last practice test; students who never mentioned them improved +21.1 (n = 112 vs 179). Identical. | |
| Switching banks (UWorld vs AMBOSS) | The same percent correct points to the same real score on either bank, within about 2 points everywhere. Full comparison. | |
| More dedicated weeks | Students who studied longer gained less, because struggling students are the ones who add weeks. Any benefit of extra time is buried under that. Study-time data. |
These comparisons match students on where they started (same first-pass accuracy, same first practice test score) and track improvement rather than raw scores, but they come from self-reported data, not a randomized trial. They are still the best numbers anyone has.
If none of this matters, where do the +24 points come from?
From the one thing this data cannot measure: the studying everyone does. Weeks of full-time questions, review, and practice exams. Every single student in 2,426 reports did it. The 1st percentile of question volume is 430 questions; nobody skipped dedicated; nobody skipped practice exams entirely. A comparison needs two groups, and there is no group that skipped the work, so it shows up as zero in every table on this page while quietly producing the +24.
What actually differs between students is the small stuff: which bank, how far past 2,500 questions, whether to redo CMS forms, one more week. The small stuff measures near zero because it genuinely matters very little. Underneath, everything tracks the same thing, how much medicine you know: NBME scores, UWorld percentages, UWSA scores and the real exam all rise and fall together (correlations of 0.6–0.8), and once one is known the rest add almost nothing. Any serious question bank builds that knowledge about equally well, which is why swapping one for another changes so little. People who improved 25 points credit whatever resources they happened to use. But everyone improved, whatever they used, because underneath they were all doing the same thing: questions, review, fixing weak spots.
The playbook the data supports
- Pick one serious qbank. Either one. Run it once, properly reviewed. This is where the points come from, and it is so universal the data cannot even measure it.
- Track accuracy, never question count. Accuracy predicts the score at r = 0.57; volume at r = 0.02. Counting questions feels like progress; it predicts nothing.
- Under 75% on your first pass: repeat your weak material. Worth +3 to +5 points, the largest measured effect of anything on this page.
- At 75%+: stop grinding questions. More questions, a second bank, another pass: all of it measured zero for high scorers. Practice exams and timing are what is left.
- Take 4 or more practice exams, spaced. Each one is worth about half a point to a point of improvement, and you need them anyway to know where you stand.
- CMS forms are for shelves. During dedicated they showed no measurable effect on improvement. Cut them first when time is short.
- Do not add weeks to fix a number. Students who studied longer did not gain more. Fix your accuracy, not your calendar. Flat trajectory or scary swings? What the score-swing data shows.
And when you want to know where you stand, that is what the free predictor is for: it reads your practice exams the way this page reads all 2,426 of them.
Common questions
How much can I realistically improve before Step 2?
The average student in 1,006 tracked reports improved +24 points from first practice test to the real exam. Starting low means gaining more: students starting 200–224 gained +35 on average, students starting 255+ gained +6. Most of it is real learning, not score inflation: comparing NBME-to-NBME only, the gain is +18 points on the same scale.
Does it matter which question bank I use for Step 2?
No. The same percent correct maps to the same real score on UWorld and AMBOSS, a qbank percentage tells you almost nothing your NBME scores have not already told you, and no swap of resources measured more than a few points. Pick one serious bank and run it properly. The head-to-head data.
What is the highest-yield thing to do during dedicated?
Work in cycles: do questions, review every miss, patch the weak topic, then check yourself with a practice exam. The two habits with a measurable payoff are a second pass of your weak material if your first pass is under 75% (+3 to +5 points) and taking at least four practice exams. Track accuracy, not question count: accuracy correlates with the real score at r = 0.57, volume at r = 0.02.
Why does everyone swear by resources that show no effect in the data?
Because everyone improves about +24 points while using them, and people credit whatever they happened to use. The improvement comes from the weeks of full-time question work itself, which every student in the data did. The specific bank, the volume past ~2,500, and the add-on resources are the only part that differs between students, and that part measures close to zero.
Get a real prediction, not a rule of thumb
The full predictor combines all of your practice tests with their timing, shows calibrated probability ranges, and was the most accurate of every predictor tested against real scores. See the head-to-head comparison.