When the Padres called up their “core four” last week, Fangraphs ran their KATOH algorithm to predict future value, and also calculated Mahalanobis Distance to produce historical comparables for the three new callups: Margot, Renfroe, and Asuaje.
Fangraphs has done like a million of these player comparison articles. It’s one of my favorite articles as there’s little subjectivity, it’s something that is not freely accessible for fans elsewhere, and the comparable names provide some great insight into possible career paths.
It also accomplishes a job – player comparisons – in cases where most humans couldn’t make an accurate judgement. For example, suppose we have three players with the following stat lines:
- .320, 12 HR, 0 SB
- .262, 12 HR, 32 SB
- .316, 20 HR, 29 SB
Which two players are most similar to one another? Honestly, I couldn’t tell you with any degree of confidence.
It’s not an easy question to answer because batting average, homeruns, and stolen bases aren’t on the same scale. One homerun is obviously not the same as one point of batting average – let alone in terms of predictability – so we can’t just add and subtract raw statistics. (Technically you can, but your measurement will totally fucking suck.)
So before we can make any meaningful measurement of player comparables, we must first scale each statistical category in a way that makes one “homerun unit” equal to one “batting average unit”. For example, if we treat each attribute as a normal, bell-shaped distribution, the unit might end up being overall population percentile. In that case, going from 55th to 57th percentile in batting average would be two batting average units. Once we’ve done that, or used some other similar method, we can use basic euclidian geometry to determine which prospects are closest, i.e. most similar, to one another in this n-attributes dimensional space.
The exact algorithm Fangraphs uses for this normalization and distance calculation is called Mahalanobis Distance. Fangraphs inputs a variety of minor league data – walk rates, contact rates, power, speed, … the list is long – to make its comparison. When all the normalizations for a player are complete, his attribute values are compared to every prospect ever. The most similar prospects are those with the lowest Mahalanobis Distance, since that means they’re closer in the n-dimensional space.
Generally, the most similar players for a prospect have a Mahalanobis Distance between 0 and 2 units. Take Carlos Asuaje. The most similar player, useful utility infielder Mike Fontenot, is at a Mahalanobis Distance of 0.80. For Hunter Renfroe, former Detroit Tigers outfielder Karim Garcia is the most similar prospect, at a Mahalanobis Distance of 1.20.
For Manny Margot, though, the closest player is 4.98 Mahalanobis units away! That player is Grady Sizemore, whose 29.4 wins above replacement in his six team-controlled seasons prior to free agency makes him one of the best prospects of the past twenty years.
Corey Seager’s was 0.68 away. Byron Buxton’s was 1.44. Dansby Swanson’s was 0.78. Trea Turner’s was 0.90. Orlando Arcia’s was 0.91.
I could go on, but you’ll just have to trust me. Players are rarely as … rare as Manny Margot. Quite simply, there hasn’t been a single prospect in baseball history that was that similar to him. And I’m not sure there’s ever been a prospect with a top Mahalanobis comparable as far away as Margot’s.
Why he’s so rare isn’t that hard to glean. He has excellent fielding metrics, runs the bases well, has fantastic contact metrics, and has even shown a little pop. And he’s only 21 years old, producing well in AAA. There just aren’t a lot of guys that check all those boxes.