Technical Grading Info


My Background

I have been in the card collecting game my entire life. But this section is not about my hoarder tendencies with collectibles. It is about my professional expertise in the area. My PhD is literally in measurement. It is in how, as humans, we accurately measure things. From the obvious things such as weight to less obvious things like psychological personality. My primary focus has been on how we as people make numerical ratings of material, for example, How a teacher assigns a numerical grade to a book report. It is not a far leap to see the similarities between a room full of teachers grading english essay questions and assigning them a number between 1-10 and a room full of card graders looking at cards and assigning them a grade between 1-10. In fact, I think most would agree that the teachers task of deciding an essay score from one of a thousand books where you have to judge prose, syntax, language, grammar and comprehension is a much more mentally challenging and open ended task of the card graders task of determining the condition of corners, edges, surface and centering of a card.

The teachers task has kept me up at night throughout my career. As a measurement specialist, I want to make sure the teachers ratings are as accurate and precise as possible. The stakes with the teachers is also much higher as a mistake in the grading may lead to a kid not passing high school or not getting into their prefered college. A mistake in the card grading literally means nothing to PSA, BGS or HGA. Needless to say, the two tasks are so similar that they can be talked about interchangeably.

Outline

I believe that most card grading companies that are actually trying to grade cards are equally accurate. My reason was that grading cards is simple. It probably isn't as hard as teachers grading essays, and to be honest, teachers do a very good job. If we can do a good job at something that is harder, then we should be able to do a good job at something that is easier. On this page we will demonstrate the concepts that are considered to ensure that essasy grading and card grading goes well. These topics are discussed at card grading companies in weekly meetings and by following the basic principles outlined here you woudl end up with a good card grading company. Its not rocket science.

We are going to start with a discussion about some foundational principles: Reliability, Validity and Error and how these concepts interact to form the basis of a good grading system. We will then talk about specifics to card grading: intra-rater relaibility, inter-rater reliability, training and rubrics.

Reliability

Reliability is a straight forward concept. Reliability refers to the consistency of card grades. If you get a card graded, it comes back an 8, you crack it out of its case and resubmit it, it should come back an 8! If it comes back a 2 then something seems fishy and the grading is not consistent. If you were to do this with every card ever submitted to PSA, and correlate the first grade with the second grade for each card we would get the reliability of PSA.

Take a look at the three examples below. Which one is more reliable:

Card Grade 1 Resubmitted Grade
1 9 9
2 8 8
3 10 10
4 6 6
5 7 7

Card Grade 1 Resubmitted Grade
1 9 8
2 8 9
3 10 9
4 7 7
5 7 8

Card Grade 1 Resubmitted Grade
1 9 2
2 8 10
3 1 7
4 7 9
5 9 5

In the first table we see a reliability of 100%. Each time a card was resubmitted it recieved the exact same grade. In the second we see a reliability of 64%. There is a relationship between the initial grade and when it was resubmitted but its not perfect. The final table shows a reliability of 0. There is no relationship between the first and second card grades. All grading companies want to be, and claim to be, the first table. In reality most companies are closer to the second table.

Validity

Validity is just a more complicated term for accuracy. If a card is in perfect condition it SHOULD get a pefect grade. If a perfect card gets a 9.. that is not as accurate as a 10, but more accurate than a 4. In reality we don't know the condition of our cards so we cannot get this exact number. However, we know that companies do want to be accurate, and they do want to be reliabile. What is important is the relationship between accuracy and reliability.

Take for example this image:

Responsive image

You will notice there is an example of high accuracy-high reliability, low accuracy-high reliability and low accuracy-low reliability. There is no example of high accuracy-low reliability. It makes sense if you think about it. You cannot be accurate if you are not reliabile. Which is why reliability is so important. It is impossible to really get the number for accuracy in the card grading world, but we could calculate reliability if we resubmitted enough cards. In this case the only data we would have access to is reliability which would be a proxy for accuracy. In fact, mathematically speaking, reliability is the upper bound of accuracy, meaning you cannot be more accurate than you are reliabile. If a company is 65% reliable on its card grades, it cannot be more than 65% accurate! In fact, its accuracy is probably much lower than 65%, 65% is just the absolute maximum it can be. This concept is why reliability is so important.

Error

Error is the third important principal that affects card grading companies. Error is anything that affects the reliability, and therefore validity of a rating. If a company grading system is 75% reliabile then it is 25% unreliabile which we will now call error. Imagine a common situation that occurs where a card gets a 7 once, is cracked out of its case and gets assigned to a different rater which gives it an 8. There is some inconsistency between the grades likely due to getting a different rater. This difference is error from having a different rater. There are lots of things that contribute to differences across raters which we will call rater-error. I am sure you can think of dozens but here is a non-comprehensive list:

▪ Different Training

▪ Different Experience

▪ Different Shifts

▪ Different Age

All these sources of error add up and can decrease reliability and add to error. In fact, there is a name for measuring the consistency between 2 raters: interrater reliability. Differences can easily occur when the same rater is given the exact same card. Some reasons why a rater may give a different rating are:

▪ Tiredness

▪ Personal Problems

▪ Maybe they are angry one day

▪ Different environment or office

▪ Lighting

The list could go on forever, but when we measure a persons consistency with themselves we call this intrarater reliability. Grading companies are concered with all forms of reliability. They try to make invidivuals more consistent with both themselves by providing them with similar grading conditions while making them consistent with others through consistent quality training. Companies who want to improve their product and give accurate ratings do whatever they can to increase reliability and reduce error. Most likely companies will be do extensive training and practices with their employees. There will be consistent retraining.

Despite everything a company does, it is impossible to completely reduce error and have a perfect reliability. It is a constant war and struggle. It is unreasonable for users like us to think that every card is accurately graded and anytime we resubmit a card we will recieve the same grade. In fact, the internet is full of stories of people resubmitting cards and getting different grades. In fact, it is inevitable. We could take the same card and keep resubmitting it to the same company an eventually it WILL come back a different grade. If the company is reliabile it will probably keep returning the same grade but eventually it will be graded higher or lower because we know that company grades have some error and are therefore not 100% consistent.

One final measure these companies all use to increase their accuracy and reliability is to have each card be rated by multiple raters. If the raters agree then that is usually the grade the card will get. When the raters disagree, a more experienced rater will rate the card and settle any differences. This is also where AI rating systems speed up the turnaround process and reduce costs. Instead of having two human raters rate each card there will be one human and one AI image system that will rate the card. If the human and AI rater agree then that is the grade the card receives. If there is disagreement, a second human rater will resolve the difference.

Rubrics

A rubric is a grading guide that gives explicit criteria for scoring. Some companies actually publish their scoring rubric.

PSA

BGS

Essentially a rubric allows graders to read and view specific material as to what grade something should be. Ideally the rubric would go into detail on each of the subgrades. I am personally a big fan of when a company publishes their rubric (GO PSA AND BGS!), especially when also giving subgrades (GO BGS!) as it provides feedback for the customer so they can see exactly where their cards suffered.

From an accuracy and reliability standpoint, a rubric is essential. It helps stabalize scores over time as well as across raters so that grades from 10 years ago are compareable to grades from today.

Evaluating Companies

Any feedback a company provides its customers is a measure of its quality. As consumers, we would love to see peer reviewed studies on accuracy, or even the reliability figures these companies undoubtably calculate. However, the do not and probably never will, so what we are left with is scraps that we can try to use to determine the quality of ratings. Any information a company provides can and will be used against them at some point in time, so any information they give is important to evaluate.

The first piece of information that companies give us is their pricing model. It is no harder to grade an 86 Jordan than it is to grade a $1.00 junk bin card. Intuitively, we all understand this. A piece of cardboard is a piece of carboard. What is printed on it shouldn't affect the difficulty of grading. What is printed on it does affect the price of the card so we should be aware that insurance for shipping and keeping them stored should be more for more expensive cards. But the actual grading should be the same.

Faster turnaround times should be more expensive, but everything else being equal, two cards should cost the same amount to be graded. The same should be true of subgrades. Maybe a token fee should be placed on putting subgrades on a card, but since card companies SHOULD be using a rubric to grade, they should already know the subgrades. The difference should be printing fees, or at least something reasonable. If companies charge different grading prices or extreme amounts to add subgrades to labels you are not paying for a higher quality grade, you are in fact paying for public perception or a little bit of a shakedown. While these arent company killers it does help to know that the company is being a little dishonest in its dealings.

Rubrics. Companies should provide their grading rubrics as it helps inform customers on what they should be looking for before submitting their card to the black box that is a card grading company.

Subgrades themselves. The only feedback a grading company gives to users to show them how their card isn't perfect are subgrades. It is an important thing to do as providing subgrades provides a source of criticism into the company grading proceedures. If you see a subgrade for surface of 8, you know to look on the surface for imperfections. This comes at a cost for the company because if you cannot find the flaw where they say there is one, then you may not resubmit another order to that company. Subgrades are a way for a company to show they are more willing to stand behind their product. It is a weak way to do it, especially if they are charging for the subgrades, but it is at least better than none. A company that does not publish subgrades is not willing to submit its grading system to even the weakest form of criticism. They simply want to hide behind a general grade, relying on people to be unable to pick up that the grade is accurate or inaccurate.

Finally, one of the most imporant pieces of public information a grading company gives is a population report. A population report is a public document which allows people to look up the grading counts of cards. If companies felt they dind't have to provide it they surely wouldn't. However, it is a security tool that helps preserve the value of the card as well as provide buyers a way to look up the authenticity of the grade. Faking card grades and slabs is a problem. With an easy way to look up cards and grades it reduces fakes and maintains the integrity of the card and brand. At this point in time a card registry/population report is almost required by a reputable company.

Conclusion

I began with a pretty bold statement that most of the top grading companies are actually doing about the same quality job in terms of accuracy. With the information above that may seem much more likely to you now. In order to get a good rating system you just need to 1) Have a rubric. 2) Stick to the rubric. 3) Train Raters. 4) Evaluate rater reliability. 5) Retrain raters with poor reliability. 6) Have multiple raters for each card and an expert rater settle disagreements. 7) Provide a standardized environment and tools for all raters. By doing these 7 things, a company will most likely have a good reliability and are managing everything they can to increase reliability. Any error are things beyond the companies control (individual rater differences, personal problems, food poisoning, etc.) and affect all companies. Because of these simple facts, and how easy they are to implement, most card grading companies will have very similar reliabilities and accuracies.

If you don't want to believe that all grading companies are about the same quality based on the information and assumptions above, there is some more info to consider. People pay more for a higher condition card. In the sports card world this means a PSA 10. However, every other card collecting world is different. for Magic the Gathering (MTG), the premier label isn't PSA, but BGS. For Pokemon, it isn't PSA or BGS but CSG, one of the worst resale values for sports. For comic books it is also CSG. Essentially, different collectibles that are sensitive to condition all have a preferred top grader when it comes to resale and condition but they are ALL different. This indicates that they are all doing the same thing and the only difference is perception, since if one was clearly superior then it would be the best across the board.