This is one area where I part ways with many of my online friends who are statheads, the mantra that small samples offer no valuable information. Many SABR devotees will instantly dismiss Pitcher vs Batter match ups, since you’re dealing with a tiny handful of ABs. They will further criticize managers like Girardi who look at this info as using old, outdated methods. Check out this exchange from a while back by Rob Neyer on ESPN chat:

John (New York, NY): Rob, the sample size of batter/pitcher matchups is of particular interest to me. Obviously a sample size of 5-10 PAs against a single pitcher does not yield any useful data. However, when you consider that in those 5-10 PAs, a single batter is only facing the repertoire of a single pitcher, my question is how many PAs are required before the data becomes significant? 20? 50? More? What do you think?

Rob Neyer: More than 20. I’m not sure if 50′s enough. I’m not sure if any batter has ever faced a pitcher enough times to show us anything truly meaningful. I think what makes more sense is looking at how a hitter has fared against *types* of pitchers.

I’m a fan of Rob’s, regular reader of The Sweet Spot and generally enjoy his work, but this couldn’t possibly be more wrong headed. Why on Earth would you value a hitter’s ABs against generic Lefthanders more than the specific one he’s facing? Just for the larger sample? That’s just silly, every pitcher is different in terms of repertoire, release point, velocity, how his ball moves, etc etc. Some batters see the ball great out of one pitcher’s hand and just can’t pick it up against the next. The more specific the info, the better. But a by-product of specificity is that samples get smaller and smaller, and thus get dismissed by those who see the game only by the numbers. Because of this some SABR devotees wind up completely missing the situational side of the game, which is how most baseball professionals look at it.

I’m not anti-stat and I fully understand the concept of statistical noise, but there is such a thing as qualitative analysis and it is rightfully used in baseball all the time. What the quants don’t get is when a manager looks at this info, he’s not looking at the numbers. He’s looking at the individual plate appearances (BR has them, subs req’d) and outcomes too see if there’s anything useful there. Sometimes there is, sometimes there isn’t. Here’s a link to give an outline of what I’m talking about.

The ‘Play Index’ will say things like “Line drive to Left” “Infield ground ball to SS” or “Strikeout”. So a player might be 2-14 against a certain pitcher, but 9 of the 12 outs were hard line drives. That’s valuable info, that tells you he hits the pitcher well, just in bad luck that’s due to turn around. When Girardi cites stuff like this, you never hear him say “I batted Hitter X vs Pitcher Y because he was 2-14 against him”. Rather, he says things like “I batted him because he’s had good ABs against that pitcher”. The ‘Play Index’ is the kind of stuff he’s referring to in that much-maligned binder of his, and I’ll bet his info even goes beyond what is publicly available at BR with advanced scouting reports and whatnot. But its also a great example of how there is valuable info in small samples, if you look at the game in a more situational way, and not just purely statistically.

 

26 Responses to Why Batter vs Pitcher matchups matter

  1. Disco says:

    I think when he says types he means GB pitchers and FB pitchers, not so much left v right. In that case, yeah, looking at how a batter fares against a GB-sinker pitcher or a strikeout fastball pitcher is useful

    • Steve S. says:

      Sure, and I didn’t mean to imply a pitcher’s profile is limited to their platoon splits. There’s power/finesse, GB/FB, and the hitter’s scouting report/ profile to consider.

  2. smurfy says:

    and I guess the relevant sample size would be influenced by how recent the data is, like if the batter has had better success against the type or the individual recently, that should outweigh the long term static stat.

  3. “What the quants don’t get is when a manager looks at this info, he’s not looking at the numbers. He’s looking at the individual plate appearances (BR has them, subs req’d) and outcomes too see if there’s anything useful there. Sometimes there is, sometimes there isn’t.”

    This sounds accurate, but quite simply, how do you know?

    • Williamnyy23 says:

      I generally agree with your position, but I think qualitative analysis has to be just that. You can’t look at 7 ABs on paper and come to a conclusion because it could involve two line drive outs, a missed call by an umpire and one ball hit to the base of the wall. Now, if you observed those 7 ABs and noticed that they were all weak groundouts and pop-ups, you’d have some useful information.

      What I object to, and I am sure Neyer does to, is a manager simply looking on paper for a conclusion that doesn’t have the necessary sample size to mitigate concern over the anecdotal issues mentioned above. We really don’t know if managers are doing a more in depth looking into the small sample, or simply making their decision off a sheet of paper.

      • Steve S. says:

        You can’t look at 7 ABs on paper and come to a conclusion because it could involve two line drive outs, a missed call by an umpire and one ball hit to the base of the wall. Now, if you observed those 7 ABs and noticed that they were all weak groundouts and pop-ups, you’d have some useful information.

        Exactly, if the results are mixed, you throw it out. But if there’s a strong trend emerging, you take it into consideration, among other factors.

        But Neyer grossly overstated his case by saying this-

        More than 20. I’m not sure if 50′s enough. I’m not sure if any batter has ever faced a pitcher enough times to show us anything truly meaningful.

        Alex Rodriguez has faced Bartolo Colon 57 times. There’s nothing meaningful there? From a purely statistical standpoint maybe not, but you’d have to be blind not to know that Alex owns him.

        http://www.baseball-reference.com/play-index/batter_vs_pitcher.cgi?pitcher=colonba01#gotresults&pitcher=colonba01&min_year_game=1997&max_year_game=2009&post=1&opp_id=NYY&bats=any&opponent_status=&c1criteria=&c1gtlt=eq&c1val=0&c2criteria=&c2gtlt=eq&c2val=0&orderby=PA&orderby_dir=desc&orderby_second=Name&orderby_dir_second=asc&ajax=1&submitter=1

        • B says:

          “Alex Rodriguez has faced Bartolo Colon 57 times. There’s nothing meaningful there? From a purely statistical standpoint maybe not, but you’d have to be blind not to know that Alex owns him.”

          I find this statement a little curious. 1.515 OPS for ARod in those 57 PA’s is certainly some nice ownage. It may or may not be statistically significant, I don’t know (but keep in mind ARod has faced hundreds of pitchers over his career, some of them have to be different from his mean level of performance at a statistically significant level by default – Type I error, and how many of them are depends on what you define your alpha to be). What I really find curious is putting this statement into the larger context of whether that’s useful information or not. The link says that data is from 1997-2009. I don’t know how much of the data came from each of those various years, but I think we’re all on board with the notion that Colon and ARod are not at all, in 2011, the player they were in 1997, or even 2005 for that matter (CY Young for Colon and MVP for ARod), right? So if we’re looking at a matchup between them in 2011…why would we think that data is useful to decide if it’s a better matchup for the Yankees than we would otherwise expect without that data? Colon isn’t the same pitcher, ARod isn’t the same hitter…it’s a completely different matchup now, who knows if it’s still a good matchup for ARod, assuming it really was in the past (a dubious assumption, I think)….?

          Projecting the future is hard. I don’t think any reasonable person would think statistics are the only possible useful indicator for predicting the future, there are definitely qualitative/scouting methods out there that are useful. The problem is when managers, who don’t really understand statistics, start trying to project the future using statistics as their tool; they often do it wrong. Using 20 PA’s of performance from some split to make that decision – that’s trying to project the future using statistics, and it’s the wrong way to do it (the additional variance you add from that small sample data greatly outweighs the additional “explanatory power” the data contains). By the way, things like line drives or strikeouts or other things you get from the play index – those aren’t qualitative. They’re quantitative, and are good data to be used for making statistical forecasts more optimal (often the inputs of an outcome are most useful for statistical forecasting than the outcome itself. example – it’s why point differential is more meaningful than actual record in pretty much every sport). Basically any outcome-based approach is going to be quantitative, and need to be used properly as you’d do with any other statistics. Things I’d really call qualitative would be scout-driven things like how a guy sees the ball, maintains his balance, identifies pitches (all against that individual pitcher), etc. And even those are subjected to sample size considerations, though I’d guess the sample size requirement is much smaller.

  4. Alex says:

    “The ‘Play Index’ will say things like “Line drive to Left” “Infield ground ball to SS” or “Strikeout”. So a player might be 2-14 against a certain pitcher, but 9 of the 12 outs were hard line drives. That’s valuable info, that tells you he hits the pitcher well, just in bad luck that’s due to turn around.”

    Neyer is not complaining about the use of batting average, or OBP, or HR over 14 plate appearances. Neyer is complaining about the use of anything over 14 plate appearances. The approach you describe might lead to better analysis of statistics. But the sample size is simply too small to gain anything from. The point isn’t that Neyer doubts some players have had good/bad at bats against other players, he simply doesn’t believe they have any predictive value. And that’s reasonable.

    It’s like if I said “a three game sweep doesn’t tell us the Royals are better than the Yankees because it’s only three games” and you came back with “yeah, but the Royals outscored the Yankees by 11 runs during the series and looked to be playing better.” I wasn’t denying that the Royals had a better series, I was simply denying that the series meant anything.

  5. oldpep says:

    I agree that sample size means different things when applied to different situations, and batter vs pitcher probably doesn’t require as big a sample as a lot of other things. But I think you still need more than 20 and likely even more than 50 to be able to use it as a predictor. That includes the subjective stuff like how hard the ball was hit and the SS made a great play (and especially ‘he made an out, but he had a good AB’).

    Before saying it’s useful, it really has to be tested, like all of the other saber ideas.

  6. Steve Axelrod says:

    Sample size is often misunderstood as being something that can be determined in advance of the data. Let me first state that I agree with the comments made in the post, that things like quality of the at-bat matter more than the end result, due to the random chance of whether a hard hit ball went right to a fielder or in between fielders.

    To illustrate my point about statistics, consider that a batter has had 10 at bats against a particular pitcher, and hit tape measure home runs each time. Now that would not be adequate to predict the same thing would happen the next time, but it sure is statistically meaningful that this batter will likely do well against the pitcher over the next several at bats.

    Where the traditional emphasis on much larger numbers comes into play is when you’re trying to lay out the difference between something like a .300 or a .250 batting average probability. In cases like that you *can* pre-specify the number of at-bats needed. The common rule of thumb is that you have a standard deviation of sqrt(n) in the probability, or 1/sqrt(n) in percent terms. In other words, if a batter had 10 hits in 30 at bats, the error bar (uncertainty) is sqrt(10), or ~3, and the (1 standard deviation) uncertainty in batting average is 1/sqrt(10) ~ 30%, so that the batting average is .300 plus or minus .100.

    If the pitcher and batter have remained consistent in their performance (a big if, as others have alluded to) then the probability of the batter getting a hit is 30% plus or minus 10%. At this point one can get into fundamental discussions of what probability means. I’m a physicist so I take it to be my best guess of the chances that the event (a hit in this case) actually happens. Mathematicians may differ in this interpretation.

    A key takeaway here is that the stats that one uses for the variable “n” in the sqrt(n) formula is not the number of at bats, but the number of events one is interested in, in this case hits. It could as well be the number of quality at-bats. The other key takeaway is that the requisite statistics depends what you’re trying to learn about. To distinguish between a mediocre and an excellent batting average indeed takes a lot of hits. But if the batter has had a lot of hits in very few at bats, it doesn’t take nearly as much time to conclude he’s really pretty good against that pitcher.

  7. Jim says:

    Steve, I’m not sure you aren’t making Neyer’s point for him.

    What you’re really saying is that 14 or 20 at-bats, by themselves, don’t tell us enough to be statistically significant. When you look at the qualitative data, then, what you’re really saying is that you don’t have better quantitative data. So you look at what you have, and try to divine what might appear from a much larger sample size.

    However, in a B v. P matchup scenario, even to get to 20 at-bats against a pitcher (roughly 5-7 games’ worth) is usually going to take more than one season, which clouds even the qualitative data, especially given pitchers’ variability. So you would be better off looking at how a batter fares against similar pitchers (for example, lefties who throw a certain type of repertoire, finesse vs. power, etc.) rather than looking at whether he hit the ball hard three times against Pitcher A in 2008. Recent qualitative data can be useful, but even if a player generally hits lefties well, the fact that Batter B had a bad day three months ago vs. southpaw Pitcher C doesn’t mean he can’t hit Pitcher C. He might just have been off that day, or Pitcher C was really on him game, or what have you. The limitations of the small sample size still show up, because you just don’t know what all the variables were in a handful of at-bats. It is a data point? Sure, but likely of marginal added value for decision making.

    • mikkyld says:

      Statistics is a quantitative field; it cannot by definition measure the quality of anything – just the probability that X will happen (anything from a 1b to a widget failing). Analyzing ABs can be done both quantitatively in which case neyer would be right about drawing any conclusions from a small sample size and qualitatively (how well the AB went as viewed by an expert) in which case a small sample size can indeed be pertinent.

      Note I said “can” and not “will.” Neyer continues to say the same thing (Today he said: “I appreciate the kind words, but the notion that 20 plate appearances means something couldn’t possibly be more wrong-headed. Generally speaking.”) However the one who is wrong here is Neyer

      Well at least on this topic – anyone who roots for the Yankees is wrong in general from my view :)

    • Steve S. says:

      I don’t disagree, except when this info is dismissed entirely based on sample size. A smart manager takes everything into account, the hitter’s strengths/weaknesses, the pitchers repertoire, the advanced scouting reports. And yes, the B vs P match up. Weigh everything and make a decision, but people who refuse to even look at this data based on sample size are taking a perfectly valid concept and over-applying it. And misunderstanding how to correctly use this B vs P info, as I detailed.

  8. LordD99 says:

    Steve, I’m not sure you and Rob Neyer are too far apart on what you’re saying. Unfortunately, for Rob, he is sometimes a bit short and flip in his responses in his chats, which can leave a wide gap between what it appears he’s saying and what he believes.

    I find it odd that Girardi gets criticized for being prepared, and having access to data in the format he likes in the dugout. One reason Cashman selected Girardi was because he’s data and stats driven, which was an area that Cashman used to clash with Torre, who was obviously a bit old-school in that regard. Girardi is an engineering graduate from Northwestern, who also was a MLB player and a catcher, who had a reputation for calling a good game. So I think it’s safe to assume he comfortable with both numbers and also understanding how hitters react to certain pitches. Based on that, there really is no such thing as a small sample size. It’s just a data point that needs to be used in conjunction with other data points. If a hitter is 0-7 against a pitcher with five strikeouts, that’s not enough to determine much, yet the manager should have access to that data, because he can then compare that 0-7 against a class of pitchers, say righthanded power pitchers with a good slider. If the hitter overall is 17-100 against those type of pitchers, then the manager can use that 0-7 to determine that it’s just not a good matchup. Send up the pinch hitter.

    There are other important aspects to consider that you noted, such as how hard he hits the ball against the specific pitcher (maybe he’s just hitting in bad luck), how well the batter is swinging or not swinging the bat at that particular time, and simple visual observation of a batter vs. a pitcher. You don’t have to be a baseball player to see that there are clearly batters who do read the ball well off of certain pitchers. There are also batters who are convinced they can’t hit certain pitchers. The fact that the batter believes it will make it true. If the manager knows this, he has to take that into account.

    When Girardi looks into his binder for some data, let’s not assume all he’s looking at is an 0-7 line. He’s said as much himself. He’s looking at much more.

    • Steve S. says:

      Agree completely on all counts. I was disappointed to see Rob Neyer’s flip and dismissive first comment over at BTF today, but I’ll take your word for it that he’s thought about things more than that and isn’t as locked in ideologically as some others are. As I said in the piece, I’m a fan of his work and his take is generally more nuanced than that comment would indicate.

      • Moshe Mandel says:

        I thought he was flip because he felt you misrepresented his comment by limiting the word “types” to handedness when he meant more. If he was dismissive, he wouldn’t have linked it at his blog and given us the traffic.

        • Steve S. says:

          OK, I didn’t know he did that. I’ll make sure to drop him a thank you. I just read the first comment over at BTF and it seemed dismissive. He said in the comment he didn’t even read the article (DRTFA) so I thought he was blowing me off.

  9. RMR says:

    I’m not sure you do understand the small sample concept. If the guy has just 12 PA, ANY outcome he could very well be a result of chance. You can’t just look at the sample size, but the observed data as well. It’s a spectrum, the smaller the sample, the more extreme the observation has to be for it to be outside of the “quite possibly due to chance” range. It’s not a function of looking at the 2-12 and saying he struggles or that he hit the ball well 10 times and therefore own him. It’s that neither of those occurrences are so extreme as to be meaningful. 10 extra base hits or strikeouts — now you’re probably getting somewhere. But there are few cases which are so extreme as to make the small sample meaningful.

    • mikkyld says:

      It doesnt matter if it is extreme or not – quantitatively you can not tell what 0-14 or 5-14 or 10-14 means. It could indeed be a result of chance, even if the 10 were all HRs.

      However an expert watching can see whether the pitcher is on or off, fooling people or not and how the hitter went about his task. If he looked fooled every time and somehow made contact and the 10 HRs all just went around the Pesky pole, that would certainly be a different thing from never being fooled and driving those 10 (on every different pitch the pitcher has) 450-500 feet to LF, CF and RF.

      In the latter case, you could qualitatively assert that this guy can hit that guy – even when that guy has his good stuff.

      • B says:

        Maybe. Maybe not. As I say above, I believe there are things scouts can see that are outside the realm of the statistics we keep, and suspect many of those things require a much smaller sample size than the statistics we use do…..but sample size still applies. When you’re talking about 14 PA’s (and remember, we’re talking about an indivdidual batter/pitcher matchup, so they’re going to be spread out over quite a deal of time which complicates things), I’d hesitate to put too much confidence in that. It is possible it can tell them a lot, though. I dunno – the same concepts of sample size, and the tradeoff between added variance and added explanatory power still apply. I think the really smart organizations, currently, are probably trying to figure out how to best blend scouting and statistics right now – it’s the natural progression from the statistical revolution. As just a normal fan, I have no idea where they are in figuring that out.

      • Steve S. says:

        It doesnt matter if it is extreme or not – quantitatively you can not tell what 0-14 or 5-14 or 10-14 means. It could indeed be a result of chance, even if the 10 were all HRs.

        Bloop hits over an infielders head are largely random events, as are fly ball hits. Line drives are more indicitive of skill, but even they have a large component of luck to them.

        With that being said, characterizing HRs as devoid of skill and random events is a statement that I find difficult to defend. To be sure some fly balls are outs and others are HRs depending on where you’re playing, but taking 10 such events and calling them “chance” gives the impression you don’t factor in skill whatsoever.

  10. Great posts by all. And I’m a Neyer fan, too, “generally speaking,” as Neyer says in his Monday Mendozas. But let’s try starting with the opposite thesis, that every at-bat affects all subsequent at-bats (between any one pitcher and one batter). This is going to be anecdotal, not quantitative, so it might get long. But here goes.

    One problem that I had with Neyer’s assertion when I first read it was experiential. As we become more and more quant, do we forget what it was like to play? It only took me what?…1 or 2 or 3 ABs against a kid as early as Pony League to establish the mindset that I either could or could not hit the guy. In basketball as early as junior high, I knew after 2-3 times down the court whether I could shut down a 25-point scorer and could have probably guessed with great accuracy what sort of an offensive night I’d have so long as the defense was “man” and the same guy guarded me. It doesn’t take long for an aware athlete to assess himself against his opponent. But that varies from one to another.

    Greg Maddux goes to the Dodgers late in his career, and young pitchers are amazed at his ability to predict exactly what will happen not only in an AB, but on the next pitch even without knowing the catcher’s sign. Early in his career – like Girardi – Maddux had to glance down at his little notebook. By the time he gets to the Dodgers, he just turns to that young pitcher beside him on the bench and says, “Get ready to duck!” Four seconds later, the batter sizzles a line foul to the point in the dugout where the young pitcher’s head was when Maddux issued his warning.

    What happens when the Russian coach takes Tretiak out of goal in the 1980 Olympics? The Russian coach is from the Neyer school. Can’t possible make a statistical difference. After all, there will only be 12 to 30 shots on goal by the Americans in the time left. And (for example…who knows the exact percentages?) Tretiak might stop 98% of shots and the other guy only 97%, but that’s “meaningless” (in his own words this morning about 20 ABs) to Coach Neyer. But the Americans – to a man – all later describe this unbelievable lift they get when they look down the ice and see that Tretiak is not the one coming into goal, just like I would feel if my assignment was Kip Lasauskus from Macon instead of Larry Barnard from Cameron. They both scored 25 points/game, but I could completely neuter Kip and didn’t have a prayer of slowing down Barnard. The Americans suddenly had a confidence with Tretiak removed that they did not have when the match began.

    I just can’t help but think that the young batter who goes 0-for-4 his first time against Max Scherzer with two Ks is much worse situated than the guy who goes 3-for-5 with one ding and another hard-hit ball as they launch on into facing each other for the next 15 years. Or that Max Scherzer is not more likely to keep dominating the former than the latter.

    Now, do the same 5 or 10 or 20 ABs bode similarly for Maddux, Ambiorix Burgos and Jeremy Affeldt? The inside word from the Royals organization was that Burgos and Affeldt didn’t even bother to read advance reports on opposing hitters. The grounds crew stationed in the bullpen (I knew two of them) would also tell you that Burgos and Affeldt were so busy playing little kids games in the dirt or reading porn in the bullpen that they didn’t even watch the game most of the time. They might come in to pitch to a guy they had faced 20 times without even knowing what pitches they had thrown to him on what counts, what the outcomes were, or maybe even the guy’s name, position, and whether they had faced him AT ALL.

    But Maddux would know all of that. And I can’t help but think that Maddux would take the position that EVERY at-bat affects EVERY subsequent at-bat against any one pitcher.

    • B says:

      Ok so you have a theory. Next step – test it out. See if the evidence supports the notion that it might be correct. I’m skeptical – smart, very stat savvy guys who have also been employed by MLB baseball like Tango and MGL seem to think splits like that never tell you anything worthwhile (at least that’s my own understanding of their view), and they’re the type of people who’s knowledge I’d defer to. That said, I’m not sure I’ve ever seen anything on this subject specifically, and I might just be completely in the wrong here….so the important thing is “what does the evidence say”?

    • Steve S. says:

      Terrific stuff, and frankly comments like this are a blog post in and of itself. One worth reading. Do you blog anywhere? Or would you like to guest blog on occasion? We often have slots open when some of our regulars can’t post.

  11. John says:

    I love that somebody out there realizes this. More often than not — and pretty close to all the time — those things happen for a reason. Statisticians think it’s noise, but when a guy keeps getting hits or slamming the ball off the same pitcher, or looking ridiculous for that matter, there’s a reason for it. Nice post.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Set your Twitter account name in your settings to use the TwitterBar Section.