A few weeks ago, I regressed as a writer. I regressed a lot, actually: twenty years worth of slash line data regressed against twenty years of run scoring data in various ways. But — and this is a dangerous sentence, and usually a bad one — someone asked me a question on Twitter and I want to answer it. Namely: was batting average always the weakest correlation to run scoring among the slash line statistics, or has it only become so recently?
This is going to be a quick hitter. I broke the game down somewhat arbitrarily, using eras defined by OOTP Perfect Team. I started in 1947 and went up until 2000 (the results of the 2000s were in my previous article). Here’s what those 2000s results look like, which should both give you an idea of the correlations today and preview the format for the rest of the article:
R-Squared to Runs Scored, Various Stat Pairs
Without further ado, let’s get started.
Golden Years, 1947–1960
Now, these weren’t the golden years for me, because I wasn’t alive, but I guess that’s what some people call this era of baseball. Jackie Robinson! Ted Williams! Stan Musial! Willie Mays! Batting average mattered more, but it still didn’t matter:
R-Squared to Runs Scored, Golden Years
What do I mean by that? Well, if you predict run scoring with OBP and SLG, you get a 0.908 adjusted r-squared to actual runs scored. Predict run scoring with the entire triple slash line, and you get an adjusted r-squred of 0.91. Batting average did better, on its own, as a run scoring predictor, but using OBP and SLG was the gold standard in the golden years.
Baseball Boom, 1961–1979
This is a broad era that folds in some pitching-dominant years that led to rules changes, the early part of the speed era, and some early-60s home run mania. It’s also an era where, if you know OBP and SLG, you don’t need to know batting average to predict run scoring:
R-Squared to Runs Scored, Boom Years
Like the 1947–60 span, using OBP and SLG as predictors does just as well as using all three statistics. More specifically, OBP/SLG had a 0.922 adjusted r-squared to runs scored. The full AVG/OBP/SLG regression checks in at 0.923. Average… if you’re already 99.89% of the there, it’ll get you that last tiny bit of explanatory power. That’s not exactly a ringing endorsement.
Defensive Era, 1980–1992
Even though I wasn’t alive for a big chunk of this era and wasn’t following baseball for the vast majority of it, it’s one of my favorite eras, thanks to Ozzie Smith, my single favorite baseball player and, per my mom, the person I’ve most emulated in my life. I spent countless hours mimicking the defensive plays I saw on my “Ozzie, That’s a Winner” VHS tape, which my uncle had recorded on local access TV in St. Louis. I’m a lefty, so I was doing them backwards and they never led to me becoming a defensive wunderkind, but none of that mattered to me; I just wanted to be like Ozzie. Uh, where were we? Oh, right. Average didn’t matter:
R-Squared to Runs Scored, Defensive Era
Using the criteria from above, OBP/SLG checks in at 0.863, and an all-three-slash-stats regression checks in at 0.864. It’s interesting to note that OBP and SLG explain the lowest percentage of variation in run scoring in this era, which I attribute to the huge range in team baserunning strategy and effectiveness, but that’s not the point of this study. The point is that if you already know a team’s OBP and SLG, you don’t need to know their batting average to predict how many runs they scored.
The Power Years, 1993–2000
I cut this one off at 2000, since my previous article already covered the 21st century, but OOTP extends it to 2004. Regardless, you guessed it:
R-Squared to Runs Scored, Power Years
This time, the adjusted r-squared is the same whether you look at OBP/SLG or AVG/OBP/SLG. So there you have it: throughout the eras, the correlations have remained the same. If you’re trying to predict a team’s run scoring and already have their on-base percentage and slugging percentage, you can stop there. Batting average won’t add anything to the equation.