Posts

How Do Tendencies Change on 3-0 Counts, and Should Batters Swing More Often?

Image
 Let’s look at the Statcast database again, this time examining 3-0 pitches. In the original article that I wrote introducing the database, I used the outcomes of 3-0 pitches as an example query. 3-0 pitches have always been a big deal in baseball. Batters commonly take a pitch on 3-0 (in fact 89% of the time). The thinking is that if a pitcher’s control is shaky enough to fall behind 3-0, he just might throw ball four, so the batter should make him prove that he can throw a strike. And if it ends up being a strike, you still have a hitter-friendly 3-1 count. I’ve noticed in recent years that it felt like batters were swinging on 3-0 more often. I remember Aaron Judge giving a quote (unfortunately I can’t find it) about how when he hits, he’s just looking to hit the best pitch of the at-bat, which is often on the 3-0 count. I will admit though that I’m surprised to see batters are still taking 3-0 pitches at a 89% rate. I would have guessed much lower, but for all I know this c...

Analyzing Strike Zone Data From the Statcast Database

Image
 With my Statcast database in Oracle nice and handy, let’s have some more fun and look at strike zone data. Strike zone data in Statcast is measured by two variables, plate_x and plate_z, which are like respective x and y coordinates of an axis. The strike zone is the width of home plate, which is 17 inches, although the rule book says specifically part of the baseball has to hit the strike zone for it to be a strike, so truthfully the strike zone is slightly wider than 17 inches. A baseball is three inches in diameter, so really the strike zone width is a number slightly smaller than 23 inches. Statcast measures plate_x and plate_z in feet, not inches. The middle of the strike zone width is a plate_x of 0 (like an axis origin of (0,0)), with pitches to the left being a negative value of plate_x and pitches to the right being a positive value of plate_x. For example, the strike zone is 17 inches wide (1.417 feet), so the left edge has a value of -0.7083 and the right edge has a ...

Introducing the Full Statcast Database (2019-2021)

Image
Back in July, me and my school partners created a similar data warehouse for every pitch thrown in the 2021 season up until June 23. The project was a lot of fun and you can read a PDF version of it here under the Data Warehousing section. I wanted to continue that project for all of 2021 and beyond, and have finally completed it. Baseball Savant contains data from every season since 2008, and Statcast metrics were measured starting in 2015. In a perfect world, I wanted my database to include everything since 2008, and at the very least 2015 to have a "Statcast Era" database, but unfortunately my student account in Oracle has a limited table space, so I was limited to 2019, 2020, and 2021. Across those three seasons, there were a total of 1,743,857 pitches thrown.  My database is set up in a similar dimension model "warehouse" style. The "fact table" contains all the pitches that I just described, with all their details such as velocity, spin rate, pitch ...