Why are my practice scores so inconsistent?

Q: Is a 15-point swing between practice tests normal?

Yes. Among 2,176 students with three or more practice tests, the median gap between best and worst was 24 points, and 83% had a gap of 15 or more. Large swings were common.

Score swings, sudden drops before test day, and flat trajectories, measured on 2,176 students' real score reports.

Main findings

The median gap between a student's best and worst practice test is 24 points. 95% of students swing 10+ points.
Practice forms differ in harshness by about 11 points, and a 200-question test is noisy on its own.
The real exam beat the worst practice test 99% of the time and landed within about a point of the best, at the median.
A 10-point late drop moved the expected real score by about 1 point.

How much swing is normal

Each student's gap between their best and worst practice test, n = 2,176 students with 3+ score-scale tests.

The middle half of students swing between 17 and 31 points. 83% swing 15 or more, 65% swing 20 or more. Variation of this size was common in the dataset.

How form difficulty contributes to score swings

The forms are not equally harsh. Holding timing fixed (tests taken 7–30 days out), here is how far below the real exam each form typically lands:

Practice test	Real exam ran higher by	Reports
UWSA 3	+17	172
NBME 9	+15	199
UWSA 1	+15	475
NBME 12	+14	736
AMBOSS SA	+13	44
NBME 10	+12	563
NBME 11	+12	737
NBME 13	+11	897
NBME 14	+8	688
NBME 15	+7	194
UWSA 2	+6	720

Scores on UWSA 2 and UWSA 3 taken in the same week can differ by about 11 points because of form difficulty alone. Adjusting every score for its form cuts the typical student's score variability from 8.0 to 6.9. The remaining variation is consistent with ordinary sampling error: a 200-question exam samples a broad content domain and varies by a few points on its own (the USMLE reports a standard error of about 7 points on the real exam). Form difficulty and sampling error explain much of the observed swing. The conversion pages quantify each form separately.

Does inconsistency hurt your score?

No. Holding the practice-test average fixed, the most erratic third of students finished slightly higher than the steadiest third:

Practice average	Steadiest third	Most erratic third	Difference
225–239	248	252	+4.0
240–249	255	258	+2.6
250–259	263	266	+2.6
260–274	270	271	+1.2

A large swing often includes an early low score followed by later improvement. Prediction error was similar across swing sizes, so greater variability did not make final outcomes harder to estimate.

How the real exam compares with your best and worst tests

Among the 2,061 students whose tests spanned 10+ points, the real exam landed a median of +1 point from their best test, +13 points from their average, and +26 points from their worst. It beat the worst test 99% of the time and beat the best 55% of the time.

The lowest score is especially sensitive to form difficulty, test-day variation and timing. The highest score is often more recent and may come from a less harsh form. The full set of scores, adjusted for form and timing, gives a more stable estimate than choosing either extreme alone.

A low final practice test

Among 1,485 students whose final practice test fell within 3 weeks of their exam:

Only 4% saw their final test land 5+ points below their earlier average. Most people peaked at the end.
The 26 students whose final test dropped 8+ points still finished a median of +9 points above their earlier average, the same as students whose final test was steady (+10). Their actual scores were a median of +18 points above that final practice test. This was a small group, so the estimate is imprecise.
Across everyone, the final test's deviation from the prior average had little predictive value. A 10-point late drop moved the expected real score by about 1 point, not 10.

A rescheduling decision should use the overall score level and trajectory. One unusually low final test had little independent association with the real score in this analysis.

What helped students with a flat trajectory

For each student with four or more dated tests, a slope was fit through the first half of the tests. Final outcomes were then compared with predictions based only on that early score level:

Early trajectory	Finished vs early-level expectation	Students
Flat or declining (≤ +0.25 pts/week)	-3.2	280
Moderate climb (+0.25 to +2.5)	-0.8	451
Steep climb (> +2.5 pts/week)	+2.1	586

Students with an early upward trend continued to outperform the estimate based on their initial level. Students with a flat early trend tended to remain near that estimate. Additional questions, more weeks and switching qbanks were examined separately:

Inside every trajectory group, the most-questions third finished similarly to the least-questions third. The largest difference was -1.5 points, which is within sampling variation for these groups.
Flat-trajectory students with the longest dedicated periods finished about 7 points worse against expectations than those with the shortest periods (n = 59 per group). This is likely affected by selection because struggling students are more likely to extend. The data did not show an improvement associated with extra weeks in this group.
The same percent correct mapped to approximately the same real score on UWorld and AMBOSS. Switching banks was not associated with a higher final score. See the comparison.

A second pass of weak material was associated with about +0.9 points among students with flat-to-moderate trajectories (83 repeaters). The association was larger among students below 75% on their first pass. It was not associated with an additional gain among students whose scores were already climbing steeply.

For students with flat scores, completing more questions in the same way was not associated with better outcomes. A targeted second pass was associated with improvement, especially when it focused on weak systems and the reasons behind missed questions. Use each practice exam to identify specific content or reasoning errors, then change the next review block accordingly.

When scores remain flat, review strategy should be reassessed before adding more study time. More on the measured associations with score improvement: the full evidence review.

Common questions

Is a 15-point swing between practice tests normal?

Yes. Among 2,176 students with three or more practice tests, the median gap between best and worst was 24 points, and 83% had a gap of 15 or more. Large swings were common.

My NBME dropped right before my exam. Should I reschedule?

Only 4% of students had a final practice test at least 5 points below their earlier average. Students with large late drops finished about as far above their earlier tests as everyone else. Across all students, a 10-point final-test drop was associated with about 1 point on the real exam. Use the overall score level and trajectory when deciding whether to reschedule.

Should I trust my best or my worst practice test?

Among high-swing students, the real exam exceeded the lowest practice score 99% of the time and landed within about one point of the highest score at the median. The full predictor uses the full average, form differences and timing.

My scores are flat. Should I extend my dedicated period?

Students with a flat early trajectory who studied the longest finished about 7 points worse against expectations than flat-trajectory students with short dedicated periods, partly because struggling students are more likely to extend. A targeted second pass of weak material was associated with about +0.9 points among stalled students. Reassess your review method before extending the schedule.

Combine your scores

Combine your practice scores and test dates in the full predictor. Open the predictor →

Charts & guides

Browse the score analyses, study-data pages, and assessment converters.

Score TwinsScore reports from students with similar practice results What raises your scoreStudy patterns associated with score improvement UWorld vs AMBOSSReported qbank percentages and final Step 2 scores Accuracy and methodsBlind-test results and a comparison with five other predictors When to schedule your examGoal probabilities based on recent practice scores

Score guides

Study stats

NBME conversions

More conversions