How accurate is the Step 2 CK score predictor?

In a chronological evaluation of 481 exams from 2025 and 2026, the median absolute error was about 4 points. Eighty-eight percent of estimates were within 10 points of the actual score, and the average bias was -0.4 points.

Which practice tests best predict the Step 2 CK score?

NBME 14 and UWSA 2 had the lowest single-assessment errors in this dataset. Recent NBME forms, UWSAs, and the New Free 120 (2023) performed within a narrow range, and estimates generally improved when several assessments were combined.

How does it compare with AMBOSS, PMSS, and other predictors?

This predictor had a median error of about 4 points and a mean absolute error of 5.1 points. The corresponding mean absolute errors were about 6.0 for the AMBOSS Score Predictor and the NBME-average rule, 6.9 for unattributed predictions that may include PMSS, 7.3 for the r/Step2 calculator, and 10 for usmlepredictor.com. Explicit AMBOSS Self-Assessment scores were analyzed separately from the Score Predictor row.

How the predictor works and how accurate it is

Enter at least two practice-test scores, qbank accuracies, CMS averages, or outside predictions to get an estimated Step 2 CK score and prediction range. Several recent assessments generally improve reliability and narrow the range. The predictor was built from more than 2,200 public r/Step2 score reports, then tested on 481 later exams that were not used to build it.

4 ptsmedian absolute error across 481 later exams

88%of estimates were within 10 points

2,200+public score reports used for development

Open the score predictor →

From practice scores to one estimate

Enter at least two scores. Use any combination of practice-test results, qbank accuracy, CMS average, or an outside prediction. Test-taker status, qbank completion, assessment timing, and study duration can add context, but they do not count toward the two-score minimum.
Put different tests on a common scale. Each NBME, UWSA, Free 120, and AMBOSS self-assessment has its own relationship with final Step 2 scores.
Account for timing. A score from two days before the exam usually says more about exam-day performance than a score from two months earlier. Adding the date or days before your exam helps the calculator make that distinction.
Combine what you enter. When you provide more than one score, the predictor uses them together rather than relying on a simple average.
Show a range as well as a point estimate. The range reflects how much reported outcomes varied among test-takers with comparable inputs. Wider coverage settings produce wider ranges.

Technical method

Each assessment is calibrated separately against final scores, including its timing before the exam. The calibrated inputs are then passed through 24 resampled versions of a linear model and a gradient-boosted tree model that is constrained to respond sensibly as scores rise. Their estimates are combined. The displayed ranges use conformal calibration, with coverage checked on the separate 481-exam evaluation set.

The calculation runs in your browser. The scores you enter are not uploaded.

How accurate is it?

Reports posted before 2025 were used to build the predictor. The 481 exams from 2025 and 2026 were kept separate and used only to measure accuracy after development was complete.

The median absolute error was about 4 points.
About 88% were within 10 points, and about 97% were within 15 points.
The correlation between predicted and actual scores was 0.80. Average bias was −0.4 points.

Predicted and actual scores for all 481 separate evaluation exams. The diagonal line marks exact agreement, and the shaded area covers ±10 points. Eighty-eight percent of estimates fall inside that area.

Interval coverage was also measured on the 481 separate evaluation exams. The 80% interval included the actual score in 81% of cases. Coverage was 90% for the 90% interval and 95% for the 95% interval.

USMLE reports a standard error of about 7 points for Step 2 CK. The same student could therefore receive a score several points higher or lower on another administration. That amount of exam-level variation corresponds to roughly 5–6 points of expected absolute difference and limits how low a predictor's average error can be. This predictor's mean absolute error was 5.1 points.

Error generally decreases when the predictor receives more practice scores and when those scores were obtained closer to the exam. Individual outcomes can still fall outside the displayed interval.

Predictor comparison

Every row uses the same error measures and compares a prediction with the actual score reported by the same person. Sample sizes differ because some rows use predictions posted by students, while tools with reproducible calculations were run on the separate 2025–2026 evaluation exams.

Predictor	Median error	Average error	Within 5	Within 10	Tendency
This site's predictor	~4 pts	~5.1 pts	59%	88%	bias −0.4
AMBOSS Score Predictor*	~5 pts	~6.0 pts	53%	83%	underpredicts by ~4
"NBME average + 13" rule†	~5 pts	~6.0 pts	52%	83%	overpredicts by ~1
Unattributed predictions (may include PMSS)‡	~6 pts	~6.9 pts	47%	77%	underpredicts by ~5
r/Step2 calculator§	~7 pts	~7.3 pts	41%	71%	underpredicts by ~6
usmlepredictor.com	~9 pts	~10 pts	25%	58%	underpredicts by ~9

* These 252 posted AMBOSS predictions are classified as the Score Predictor unless the post explicitly labeled the value as an AMBOSS Self-Assessment score. Explicit AMBOSS SA results are analyzed separately. The newer qbank Expected Score is not included in this row.

† The NBME-average rule underpredicted by about 5 points when the average was below 235 and overpredicted by about 6 points when the average was above 255. It missed by at least 15 points more than twice as often as this site's predictor.

‡ These 1,031 reports included a predicted score without a named source. Some may come from PMSS, but the source cannot be confirmed, so the table groups them as unattributed predictions.

§ The r/Step2 calculator was last updated in 2022 and does not accept forms newer than NBME 12.

This site's predictor had the lowest median and average error in this comparison. The AMBOSS Score Predictor and the NBME-average rule followed. The r/Step2 calculator and usmlepredictor.com underpredicted recent outcomes by about 6 and 9 points, respectively.

What each row represents

This site's predictor. It was built from reports posted before 2025 and tested on 481 separate exams from 2025 and 2026. Its median error was about 4 points, 88% of estimates were within 10 points, and average bias was −0.4 points.
AMBOSS Score Predictor. This row uses 252 AMBOSS predictions posted with actual scores. Posts explicitly labeled as AMBOSS Self-Assessment results were excluded from this row and analyzed with the other assessment scores. Median error was about 5 points, and the Score Predictor underpredicted by about 4 points on average.
"NBME average + 13." The 13-point adjustment was estimated from pre-2025 reports and tested on 469 of the separate 2025–2026 evaluation exams. Its overall error was similar to the AMBOSS row, but error direction changed at the low and high ends of the score range. The rule does not distinguish among NBME forms or use time-to-exam.
PMSS and other unattributed predictions. This row pools 1,031 posted predictions without a confirmed source. Median error was about 6 points, and average underprediction was about 5 points.
r/Step2 calculator. The live calculator was run 120 times using outcomes from the 2025–2026 evaluation group. It reports a point estimate with a ±14-point range and underpredicted these recent outcomes by about 6 points on average.
usmlepredictor.com. Its public calculation was checked against the live site's output and then run on the 481 evaluation exams. Median error was about 9 points, mean absolute error was about 10 points, and average underprediction was about 9 points.

How to evaluate a predictor comparison

The age of the development data matters because available forms and score patterns change. Using several assessments and their timing also captures more information than a single average. A useful evaluation reports both error and bias, and it tests predictions on outcomes that were not used to build the predictor. Median error alone can hide systematic underprediction or large misses at the edges of the score range.

Median error is the point below which half of absolute errors fell. Average error is the mean absolute error and is more affected by large misses. Last benchmarked June 2026.

Practice-test rankings

The ranking shows the typical error when each assessment is used by itself. It accounts for days before the exam and gives more weight to recent score reports. Lower values indicate smaller errors.

NBME 14 · UWSA 2± ~5.9–6.0
NBME 13, 15 · UWSA 3± ~6.0–6.1
NBME 10, 11, 12± ~6.3–6.5
New Free 120 (2023) · UWSA 1 · Old Free 120 (2021)± ~6.6–6.7
NBME 9 · AMBOSS self-assessment± ~6.9–7.2

Typical errors differ by about one point across the list. The full predictor requires at least two scores, and multiple recent assessments generally improve reliability and narrow the prediction range. Only about 90 NBME 16 reports were available, so its ranking remains uncertain. The early estimate is similar to the other recent NBMEs.

Data source and limitations

The dataset comes from public r/Step2 score-release posts that include practice results and an actual Step 2 score. Reports are extracted, reviewed, and retained only when the numbers are unambiguous. More than 2,200 reports posted between January 2023 and May 2026 were used for development.

Each record pairs reported practice scores with the reported final score. The test-taker comparison uses the same dataset. Records contain numeric assessment data and year-level timing; usernames and post text are excluded.

The sample is not representative of all Step 2 test-takers. Its mean score is about 257, roughly 7 points above the official average. Higher scorers are overrepresented, all results are self-reported, and the dataset contains few retakes. The displayed percentile is calculated from the official USMLE distribution for US MD first-time test-takers from July 2022 through June 2025, as published in the USMLE Score Interpretation Guidelines.

Reading the results

The curve displays the estimated distribution of possible scores. Taller portions represent higher estimated probability. The shaded tail corresponds to the odds shown below it. Hover or drag across the curve to check the estimated chance of reaching a selected score.

The range selector changes the interval coverage to 67%, 80%, 90%, or 95%. A 67% interval is expected to contain about two of every three outcomes for people with similar inputs. Higher coverage produces a wider interval.

How each score affects the estimate recalculates the prediction after removing one input at a time. The displayed change shows how much each entry affected the combined estimate.

Your trajectory converts each timed assessment to its associated Step 2 estimate and fits a trend over time. Adjusting for differences between forms means the trajectory may show a smaller change than the raw practice scores.

Test-takers with similar scores becomes available after you enter at least two practice assessments. It identifies reports with the closest practice results after putting assessments on a common scale, then displays their reported final scores.

Specialty data

Specialty means describe matched applicants. Most values come from the 2024 NRMP Charting Outcomes in the Match. Ophthalmology uses SF Match data, Urology uses AUA match data, and Thoracic Surgery uses a program-director survey. Values were read from published charts and are approximate. The displayed spread uses an SD of about 13 points, consistent with the overall matched-applicant distribution. The table describes Step 2 scores and does not estimate match probability.

Privacy

The predictor processes and saves entered scores in your browser rather than sending them to this site's application servers. Saved inputs use browser local storage. A shared link contains the entered values, so anyone with that link can read them. The site collects anonymous page-view counts.

Charts & guides

These pages use the same score-report dataset for related analyses and conversions.

Score TwinsSee score reports from people with similar practice results What raises your scoreStudy variables associated with actual Step 2 scores UWorld vs AMBOSSCompare qbank results with actual Step 2 scores Predictor comparisonAccuracy results for six prediction methods Score swings & late dropsData on practice-score ranges and late score drops When to schedule your examCompare recent practice scores with actual Step 2 outcomes

Score guides

Study stats

NBME conversions

More conversions

Predictor and data last updated June 2026.
Not affiliated with the USMLE®, NBME®, NRMP®, UWorld, or AMBOSS. Predictions are statistical estimates from self-reported data, and individual outcomes vary.