The Metrics of Winning: How Fantasy Baseball Can Help Us Understand Data Big and Small

Stat-heads rejoice: baseball season is here, which means it’s time to start talking wins-above-replacement, weighted-on-base-average, and fielding-independent-pitching. In this post I’ll share a simple formula for winning your fantasy league, but before I lose the less-statistically-inclined of you in the outfield weeds, let’s look at the bigger picture, which has little to do with numbers and everything to do with the stories they tell.

It’s been 10 years since Michael Lewis’s Moneyball introduced a revolution in sports management to the public at large — the notion that knowing which stats make valuable metrics, and more importantly knowing which metrics your competitors undervalue, can give you a major advantage, even if you’re on a David-sized budget in a league of Goliaths. Today, Big Data is the hot topic, and while it certainly has transformative potential everywhere from retail to climate science, it takes some serious (and seriously expensive) technology to collect and store those reams of data, not to mention a hardcore number cruncher to analyze them.

That’s why it’s time to think about the Age of Big Data as the Age of Small and Medium Data Too, and time to take the lessons of Moneyball and run with them. As Elena Klau wrote in Adweek last week, you don’t need Big Data to make smart decisions — you just need the right data, and you need it to yield better results than your competitors’ data. Most importantly, whether you’re a data scientist or not, you need to be able to make sense of that data, which means starting with simple, manageable metrics that can actually produce actionable insights.


The biggest problem with statistics is that they’re meaningless unless you understand them in context. Is a 30% success rate good or bad? If you’re a professional baseball player, batting .300 gets you into the Hall of Fame; if you’re an NFL kicker, hitting 30% of your field goal attempts will get you cut in training camp. A 30% conversion rate might make you an e-commerce supergenius, but if you’re an air traffic controller, a 30% success rate will get you thrown in prison. We may intuitively understand that each of those jobs has certain benchmarks for success or failure, but we only intuit those benchmarks after we see how lots of different people perform at those jobs and figure out who’s better than the others.

But the thing about competition, whether it’s in sports management or in marketing, is that finding a winning formula isn’t about perfection or even hitting benchmarks — it’s about being better than the rest. And sometimes, “better” or “best” is not a matter of one simple metric, but several combined. So how do you put all those metrics together to make the perfect Voltron-esque mega-metric? Using a simple stats trick, even the non-stat-heads can make their numbers come together.


If you took a statistics course in college, you probably learned, and then quickly forgot, all about standard deviations and Z-scores. Don’t worry, I’m not even going to try to refresh your memory. Just think about Z-scores as a ratings system: 0 is average; -1 is bad, -2 is terrible, -3 is dreadful, and so on; 1 is good, 2 is great, 3 is amazing, and so on. A Z-score simply tells you how much better or worse than average an individual is. The range of Z-scores depends on how many outliers exist and how many individuals are roughly average, but typically a -2 is in the bottom 5% and a 2 is in the top 5%. While a Z-score of 3 is amazing, a mind-blowing outlier could go way beyond that: in 1920, when Babe Ruth hit 54 home runs in a league where the average was just over 5 and the runner-up hit 19, the Bambino’s home run Z-score was a ridiculous 7.3.

The nice thing about Z-scores is that they’re easy to calculate in Excel and they can give you normalized ratings for all sorts of different data sets, which you can then add up to calculate a “best overall.” For instance, if you’re buying a new car, you might be looking at all kinds of disparate data: price, safety rating, gas mileage, durability, and so on. If you put those numbers together for every car you’re considering, calculate Z-scores for each category, and then add up those Z-scores, you’ll see which one is actually the best. Say the price is twice as important to you as any other figure; simply double the price Z-score in your tally to give it that extra weight. You could do the same sort of analysis for buying stocks or determining what social media channels will yield the best ROI for your marketing campaign. I just happen to be a baseball nerd, and this is my (now-open) secret for winning my fantasy league.


Fantasy baseball is all about the numbers. Every owner in a fantasy league fields a team comprised not of real baseball players, but of real baseball players’ statistics. That is to say, while Matt Kemp may not wear my team’s uniform, every time he gets a hit or steals a base my team accumulates those stats. Most leagues use five statistical categories: home runs, runs, runs batted in, batting average, and stolen bases. Some players may be major contributors in one category and detractors in another. The best way to understand how they actually help or hurt a fantasy team is to look at their Z-scores.

In the graphs below, you can see how the ZiPS projection system expects the 2013 MLB season to play out. As the distribution curves indicate, most categories have relatively even amounts of good and bad performers, with batting average being more tightly bunched in the middle and stolen bases skewed toward zero, as there are far more slowpokes than speedsters in the majors right now. The real story, though, is in the top 10 players: Miguel Cabrera, who last year became the first player since 1967 to win the Triple Crown, is only projected to have the third-best stats, mainly because of his negative Z-score for stolen bases. Giancarlo Stanton may be the presumed home run king, but because he doesn’t contribute as much in other areas he’s ZiPS’ number six overall.

And who’s the fantasy MVP? The Z-scores tell us it’s the man whom many people thought should have beaten Cabrera for the real MVP trophy last year: Mike Trout.




Tags: , , , , , ,

Join the Conversation