The Comprehensive Guide to Understanding and Interpreting JLPT Scores

JLPT certificate of results and scores for N5, N4, N3

If there is one thing about the Japanese Language Proficiency Test (JLPT) that is confusing to most test-takers, it will be the test results.

It seems that everybody has their own interpretation of how the scoring works in the JLPT and they talk about it as if scoring is shrouded in mystery. I thought, shouldn’t there be information about this that is publicly available?

Not surprisingly, the official JLPT website does provide information on the scoring. The same explanation, printed on a separate sheet, also accompanies the certificate and official score reports. I can only guess that nobody really bothered to read it and understand its implications.

Having a clear idea about how scoring works will help you develop a proper mindset and study approach for the JLPT. The right attitude and preparedness are important to pass the JLPT with excellence.

If that sounds attractive enough for you, I hope you’ll take the time to read this article as we explore each component of the JLPT score report.

Components of the official score report

You’ve anxiously waited nearly two months after test day, and now your results have arrived! Beyond passing and failing, you are probably curious to know how well you actually did in the exam.

There are three parts of the score report that measure your performance: test scores (overall and sectional), percentile rank, and reference information. Of course, the most important information is the test score, and it ultimately determines whether you passed or failed.

How is pass or fail determined?

In my article about basic information on the JLPT, I briefly discussed the scoring in the test.

The total score ranges from 0 to 180 points, while the score for each section ranges from 0 to 60 points.

The final result, “pass” or “fail”, is determined by passing each test section AND the whole exam itself. Your score must be at or above the overall and sectional pass marks as shown in the table below:

The overall and sectional pass marks for each level of the JLPT

If you get 60 points each for two sections and 18 for one section, the total score is 138. The overall pass mark is met, but sectional pass marks are not. Thus, the result is “Failed.”

If you get 19 points across all three sections, the total score is 57. The sectional pass marks are met, but the overall pass mark is not. The result will also be “Failed.”

Again, you must attain BOTH the overall and sectional pass marks in order to pass the exam.

Do the scores represent the number of correct items?

Now, considering the above information, do you only need to pass roughly 30% of the items per section and 50% of the overall exam? If you think about it this way, such numbers signify incredibly low passing thresholds. This must mean that the JLPT is a walk in the park.

But that is a hasty and careless thought. Many test-takers have this misconception, so they tend to underestimate the test. This is because they interpret the numbers as raw scores, which are simply the total number of correct responses in the test.

The JLPT, however, uses scaled scores. As you’ll see in the succeeding discussion, scaled scoring can be a much harsher scoring system than using raw scores. It can easily weed out examinees who aren’t sufficiently prepared for the JLPT.

Therefore, the figures on your score report DO NOT represent the number of correct items in the exam. To illustrate, the official mock exam for N3 has 39 items for Reading Comprehension, but the maximum score for that section is 60 points.


Disclaimer: I am neither a mathematician nor a statistician, so the following discussion is limited to the best of my knowledge and research.

What is a scaled score?

A scaled score is the result of converting a raw score onto a common scale using a statistical process that adjusts for differences in difficulty among different versions of the exam.

The primary purpose of scaled scores is to maintain comparability. Even with the best effort of experienced and expert test-makers, it is nearly impossible to develop a test that has a different set of questions while achieving the exact same level of difficulty as those in previous tests. As such, scaling is applied to address this limitation.

Why can’t raw scores be used?

Consider this example: A group of students was given a difficult exam. Most of the students earned scores with a range of 40-60%. However, suppose the same group was given an easy exam instead. The resulting average range is now 80-100%.

Does this mean that the skill level of students increased? No. Their skill level at one point in time couldn’t have changed just because they were given a different set of exam questions.

Are raw scores enough to measure one’s ability?

This is why raw scores are prone to misinterpretation. Examinees with high Japanese proficiency may get low scores because of a difficult exam, while examinees with low proficiency may get high scores because of an easy exam.

Likewise, two persons with the same level of proficiency may get different results, while two persons with different levels of proficiency may get the same result.

What is the benefit of using scaled scores?

To address such problems in interpretation, scaling adjusts for variances in the difficulty of exams using a complex mathematical equation method.

The result is that the Japanese proficiency of examinees is independent of exam difficulty. This minimizes misinterpretations and inappropriate inferences. Thus, scores are measured more fairly and reliably.

By accounting for difficulty factors in scoring, JLPT scores become comparable, whether you took the same exam level in July or December or even in different years.

Other famous standardized tests that use scaling are English proficiency tests, such as TOEFL and IELTS, and college entrance exams in the US, which are SAT and ACT.

For these standardized tests, the scaled scores are calculated based on a statistical theory called Item Response Theory (IRT).

What is Item Response Theory (IRT)?

The Item Response Theory (IRT) refers to a family of mathematical models which attempt to explain the relationship between observed responses to an item and the examinee’s underlying trait.

Put simply in the context of test-taking, IRT analyzes the likelihood of getting a certain score (observed response) based on the ability of the examinee (underlying trait). Persons with high ability are more likely to answer correctly, while those with low ability are less likely to answer correctly.

Such probabilities are estimated based on certain characteristics of the test question which are also called “parameters.”

What item characteristics are considered in IRT?

Depending on the type of IRT model used, there could be one or more parameters involved in the scoring process. The JLPT’s official website does not openly disclose the parameters applied in computing the scaled scores.

However, many well-known standardized tests use three parameters which are:

  • Difficulty parameter – The level at which the test-taker has a 50% probability of answering the question correctly. It tells how difficult a question is. This parameter is always included in computing scaled scores for exams.
  • Discrimination parameter – The degree at which a question can differentiate highly proficient test-takers from less-proficient test-takers.
  • Guessing parameter – The probability that test-takers will obtain the correct answer by guessing.
How is IRT applied to compute scaled scores?

The IRT serves as a framework for which scaled scores are computed. As mentioned earlier, this theory is focused on the probability that a test-taker will answer an item correctly.

One of the methods in determining the test-takers’ scores in an IRT-based test is “pattern scoring.” The score is based on a mathematical estimate of the test-taker’s ability, and this estimate varies depending on which items the test-taker answered correctly, or the “answering pattern.”

So, for a certain question or item, there are two possible outcomes―it is answered either correctly or incorrectly. When the outcomes for all items in the exam are combined, an answering pattern is formed.

Based on the answering pattern, the ability estimate is derived and is then translated into a scaled score through mathematical processes called scaling and equating. In the case of the JLPT, the scale is a set of numbers that ranges from 0 to 60 for one scoring section.

For more information, visit the website of ETS for research publications on IRT. A simplified, non-mathematical explanation from ETS is also available here. ETS is the organization that develops TOEFL, TOEIC, and GRE tests.

What is a percentile rank?

Details of test results are shown at the bottom of the official score report

When you receive your JLPT certificate of result and scores, you will find a percentile rank right beside your test scores. This should not be interpreted as a percent-correct score.

The percentile rank is a score that indicates your relative position among other test-takers who also took the same exam. It indicates the percentage of the test-takers who scored lower than you.

Let’s use the image above to illustrate. If your percentile rank is 99.5, this means that you performed better than 99.5% of all the other test-takers. In other words, 99.5% of the group earned a score that is lower than your score of 175 points, and you belong in the top 0.5%.

Who are included in the percentile ranking?

The percentile rank is useful in objectively comparing your performance in a specified group. In the JLPT, this group is represented by those who took the same level of the test for the six latest occasions.

If you took N3 in December 2021, then the group by which you are ranked will be composed of examinees who took the N3 exam since December 2018. Note that the JLPT for July 2020 was canceled in all test sites, so the range now includes the December 2018 exam.

If you took the December 2021 exam, you will be ranked against all other examinees from December 2018 to 2021
What is the average score of JLPT examinees?

Since the percentile rank shows one’s standing in a group of test-takers, we can say that the average score is the one that is equivalent to the 50th percentile, which is half of the total number of examinees.

Statistics of past exams are posted on the official JLPT website, and information on percentile ranks is included in it. Using such data, we can infer what the average score is for each JLPT level.

An item characteristic curve for the July 2021 N3 exam

In the above image, a scaled score of 90 is nearly equivalent to the 50th percentile for N3 in the July 2021 exam. Comparing this score to the overall pass mark of 95 points for N3, this average score is slightly lower. Let’s see the 50th percentile for other levels of the same exam:

Comparison of overall pass marks and scores for the 50th percentile in the July 2021 JLPT

Since the official data only provides percentile ranks for scaled scores at intervals of 5 points, we cannot get the exact score for the 50th percentile. Instead, we’ll use the scores immediately above and below the 50th percentile.

As you can see in the preceding table, the average score of test-takers for all levels is about 5 to 10 points lower than the overall pass mark. Thus, more than half of the test-takers fail the exam. What does this mean for you?

In order to pass the JLPT at the minimum, you need to get a score that is above the average. In other words, you need to be better than the typical test-taker.

We could also confirm this information with the percentage of examinees that passed the JLPT, but percentile ranks give us an idea about their actual level of performance.

Lastly, it is worth remembering that percentile ranks do not show equal intervals. For example, the difference between 90th and 95th percentiles typically shows a larger difference in proficiency than does the difference between 45th and 55th percentiles.

What is the purpose of reference information?

Reference information is included as part of the JLPT score report

When you receive your score report, you will see a box containing reference information. This shows the percentage of items that were answered correctly.

Since the JLPT reports scaled scores, we cannot determine actual raw scores. Thankfully, this reference information provides us insight as to raw scores, but it is only limited to the Language Knowledge section. Below are the criteria for raw scores:

A: The number of correct responses is 67% or higher
B: The number of correct responses is between 34% and 66%
C: The number of correct responses is less than 34%

Notice that the criteria above only divide raw scores into three equal ranges. Thus, this shouldn’t be interpreted in the same way as letter grades which are commonly used in the U.S., wherein an “A” is equivalent to a score of 90 to 100 and a “B” is equivalent to 80 to 89, and so forth.

It is possible to pass a section even if you get a “B” in it but getting a “C” will certainly result in failure.


Comparison with the old JLPT

Until 2009, the JLPT had four levels: 4-kyuu to 1-kyuu. The gap between 2-kyuu (N2 equivalent) and 3-kyuu (N4 equivalent) was too wide, so a new level, N3, was introduced to bridge that gap.

For a detailed and side-by-side comparison between the old and new JLPT, please refer to the following links:
Comparison of test content and scope
Comparison of test sections, test time, and scores

Can the scores with the new JLPT be compared to the old JLPT?

The scores between the old and new versions of the JLPT cannot be compared. The main reason for this is the difference in scoring. The old JLPT uses raw scores, whereas the new JLPT uses scaled scores.

The old scoring system was much simpler because only raw scores determine pass or fail results. From 4-kyuu to 2-kyuu, total score should be at or above 60% to pass the test. For 1-kyuu, the minimum passing grade is higher, which is at 70%.

ごうかく・Passed ふごうかく・Failed
If you can pass the old JLPT, can you also pass the new JLPT?

Yes, those who have the competence to pass the old JLPT can pass the new test. Using statistical analysis, the pass marks for the new test have been determined such that it closely matches the old test. This is explained on the FAQ page on the official JLPT website.

However, as we already know about scaled scores, getting a 60% raw score (or 70% for N1) does not necessarily guarantee a passing result, because other factors are considered in the complex scoring process. To be safe, you should aim for a higher raw score to secure a passing mark.

We could also use this information as a benchmark for evaluating mock exam scores during the JLPT review. When taking a full mock exam, your percent-correct score should be at or above the passing percentages of the old JLPT.


Debunking misconceptions on JLPT scoring

Over the past few years that I’ve been studying Japanese, I’ve heard a good chunk of misinformation on JLPT scores.

Let’s look at each of them and evaluate why it is false:

(1) As long as I get half of the exam items correctly, I’ll definitely pass.
Take the JLPT with full effort if you want to pass. Don’t underestimate it!

The overall pass mark is near or at the middle of the score range, but these are scaled scores. They are a measure of your ability, not the number of items answered correctly. If any, getting only half of the items correct will most likely lead to failure.

We can also use the old JLPT passing scores as a good starting point to evaluate raw scores. Here, one must at least answer 60%-70% of questions correctly.

(2) I got a point deduction in one section! So I guess I only had one incorrect answer.

Again, the results of the JLPT show scaled scores. With scaled scores, it is even possible to earn a perfect score even if you missed 1 or 2 items.

I have experienced this myself when I took N4. I was certain that I incorrectly answered a question in the Language Knowledge section when I confirmed this with my mentor and my classmates. In the end, I still got a perfect score.

(3) Each question item merits different points–around 3 points for hard ones and 1 point for easy ones.

This statement came from a test proctor when I took N5. However, computing for scaled scores is much more complex than assigning point values for each question.

This scoring method can be flawed because a test-taker with low proficiency can correctly answer a difficult question by guessing, thus earning more points. Or an easy question can trip up a person with high proficiency because of his carelessness in answering. Both these instances do not necessarily reflect one’s true ability.

The same type of scoring process (assigning point values depending on perceived difficulty) was used in the JLPT prep book series, Moshi to Taisaku, to compute the score in its mock exam. Perhaps this is where the proctor got her idea on scoring…

(4) To calculate your score, add up all correct responses and then divide by the number of total items in the exam. Then, multiply the result by 180 points.

This calculation is merely applying the percent-correct score with the maximum scaled score. Such simplistic and straightforward math is far different from the scaling process that is used to derive the final reported scores.

Amusingly, this statement came from a JLPT proctor who shared this information with new proctors during the test briefing. (Yikes!) I hope this didn’t get widespread.

(5) The July exam tends to be harder, so take the December exam to increase your chances of passing.
Because the JLPT uses scaled scores, it is irrelevant whether the July or December exam is more difficult.

Whether one version of the test is more difficult than the other is a subjective opinion. But even if that is the case, the process of computing for scaled scores will make your test performance results comparable to previous and future versions of the JLPT. This is the beauty of scaled scores.

Take the JLPT whenever you’re ready. Don’t avoid it just because you fear that its level of difficulty will change. With the semi-annual schedule of the test, you only have very few chances to take it.

(6) Getting an “A” in the reference information means I must have gotten a near-perfect raw score.

Earning a score classified as “A” means that you correctly answered at least 2/3 of the questions, as stated by the criteria of the JLPT. Of course, you could get an “A” if you got near-perfect raw scores, but you could still get the same result even if your raw score were as low as 67%.


Conclusion

Now that we’ve explored each part of the JLPT score report, has your impression of the JLPT changed? Personally, when I learned about the scoring method for the first time, I realized that the JLPT is a different beast; it isn’t easy to pass at all.

Some technical concepts were introduced in this article, but nevertheless, I hope you found them interesting. With this knowledge, you can now understand why the following points will encourage you to prepare intensively for test day:

  • You cannot pass the exam by just guessing even if it is a multiple-choice format.
  • You must answer most of the questions correctly to pass.
  • Your test performance should be better than the average test-taker.

Above all, if you want to ensure your success with the JLPT, don’t settle with the minimum level of performance.

Do your best and aim for perfection!

Author: Francesca Galve

Japanese language enthusiast (JLPT N1). Master's student in Tokyo, Japan. Accountant by profession.

3 thoughts on “The Comprehensive Guide to Understanding and Interpreting JLPT Scores”

  1. Rather interesting that the Japanese test uses scaled scoring, but then again, but then again so do those English tests you’ve mentioned. Good to know, because I have always been interest in learning Japanese.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: