Analyzing Strike Zone Data From the Statcast Database

 With my Statcast database in Oracle nice and handy, let’s have some more fun and look at strike zone data. Strike zone data in Statcast is measured by two variables, plate_x and plate_z, which are like respective x and y coordinates of an axis. The strike zone is the width of home plate, which is 17 inches, although the rule book says specifically part of the baseball has to hit the strike zone for it to be a strike, so truthfully the strike zone is slightly wider than 17 inches. A baseball is three inches in diameter, so really the strike zone width is a number slightly smaller than 23 inches. Statcast measures plate_x and plate_z in feet, not inches. The middle of the strike zone width is a plate_x of 0 (like an axis origin of (0,0)), with pitches to the left being a negative value of plate_x and pitches to the right being a positive value of plate_x. For example, the strike zone is 17 inches wide (1.417 feet), so the left edge has a value of -0.7083 and the right edge has a value of 0.7083. I will repeat though that the strike zone is technically a little wider than that. Plate_z meanwhile is much easier to interpret; it is simply how many feet above the ground it is. While every batter has the same strike zone width, every height is unique. Because of this, Statcast records a plate_z value for the strike zone top (sz_top), the midpoint between the belt and shoulders, and bottom (sz_bot), the bottom of the knee cap.

Now let’s get into some fun querying. What were the lowest pitches that were called strikes?


The lowest called strike was thrown to Michael Brantley and was 0.93 feet (11.16 inches) off the ground. That’s pretty low and just one of four pitches lower than one foot that was called a strike. But this can be a little misleading—for example, Ronald Torreyes (who is legitimately a short guy) and Nicky Lopez are on that list but they also have low strike zones. What were the pitches that were most blatantly wrong, below a hitter’s actual strike zone bottom, that ended up being called strike?


This list of ten has a few carryovers from the previous one, but it does give a better answer to the question. Brandon Drury wins here, where he had a pitch 8.64 inches below his strike zone bottom called a strike. Here’s a clip of it, and if you guessed that he was upset with the call then you’d be correct. Aaron Judge appears in this list, which isn’t surprising considering it’s become a thing now that low strikes get called on him due to his big height. In his case in the table above, his strike zone bottom was over two feet above the ground. Here’s a clip of his pitch from the table. 

So what about the opposite? What were the pitches closest to down the middle that were called balls? I calculated this using the distance formula of SQRT[(x2 - x2)2 + (y2 - y1)2] between the points of the pitch and the middle of the strike zone on that pitch (plate_x of 0 and midpoint of sz_top and sz_bot).


The first two pitches really grab my attention—how could a pitch less than three inches from down the middle get called a ball? Let’s watch what they look like. Here is the first pitch and here is the second pitch. Yep, both are right down the middle, but the catcher was certainly crossed up and not expecting a curveball, causing him to frame it in a funny way. 

Which batters had the highest bottom of the strike zone? This one’s inspired by Aaron Judge.


You’ll notice most of these batters aren’t really batters; they’re really tall pitchers called up to bat. Also, when you picture a pitcher batting, they always seem to be standing upright, just waiting for the at-bat to get over with. Here is what the 6’7 T.J. Zeuch looks like up to bat. Just as expected. If we filter only batters who saw at least 50 pitches, thus eliminating those awkward pitchers who rarely bat, we see Aaron Judge in the number one spot.


Lastly, let’s look at how the strike zone expands based on the count. Foolish Baseball came out with a video recently analyzing Aaron Judge’s strike zone (I swear I’m not copying him) and he uses a strike zone of width -0.8 to 0.8 rather than -0.7083 to 0.7083 to account for pitches that partially touch the strike zone, like I described at the beginning. Foolish Baseball added about 0.1 feet to each side of the strike zone, which could have been closer to the 0.25 foot width (3 inches) of a baseball, but I’ll use 0.1 foot additions in my calculations like he did. Similarly, I will had 0.1 feet to the bottom and top of the strike zone.

It’s always said that the strike zone changes on a 3-0 count and an umpire will be more generous to give a pitcher a strike call by expanding the zone a bit. Is this true? How do other counts behave?


We can see there’s a clear correlation between how friendly a hitter count is and how often a should-be ball is called a strike. The 3-0 count is by far the friendliest strike count. As a strike is added from 0-0 to 0-1 to 0-2, the percentage trends downward. It’s true; enforced strike zones depend on the count.

Comments

Popular posts from this blog

Introducing the Full Statcast Database (2019-2021)

Player Stat Percentiles in R