Reputation: 99
Looking to determine which pitcher's pitch locations changed the most on a year over year basis. I've included the code that I've used so far below.
strike_zone_analysis <- final_2016 %>% mutate(low_zone =
ifelse(between(plate_x, -1.10, 1.10) &
between(plate_z, 1.49, 2.17), 1, 0)) %>%
group_by(pitcher_name, game_year) %>%
summarise(n_of_pitches = n(),
prop_low_zone = sum(low_zone)/n_of_pitches)
growth <- function(x)x/lag(x)-1
YOY <- strike_zone_analysis_15_16 %>%
group_by(pitcher_name) %>%
mutate_each(funs(growth), prop_low_zone)
YOY <- mutate(YOY, prop_low_zone = prop_low_zone*100)
YOY$prop_low_zone <-round(YOY$prop_low_zone, 1)
For the purposes of this example I've created the following dataframe below that mostly matches the YOY dataframe from the above code.
pitcher_name <- c('AJ Griffin','AJ Griffin','AJ Burnett','AJ
Burnett','Zach Godley','Zach Godley')
game_year <- c(2016, 2017, 2016, 2017, 2016, 2017)
#_of_pitches <- c(456, 550, 1001, 1760, 1500, 1800)
pitching <- data.frame(pitcher_name, game_year, #_of_pitches)
I'm looking to isolate the pitchers in the dataframe that have thrown at least 500 pitches in both 2016 and 2017.
If I use
filter(pitching, #_of_pitches >=500)
, I'm left with with all three pitchers when I only want the pitchers that have thrown at least 500 pitches in both seasons (in this example AJ Burnett and Zach Godley). I'm guessing there is a way to do this with dplyr filter function built in, but I've been spinning my wheels trying to figure it out. Any input would be appreciated. Thanks!
Upvotes: 1
Views: 274
Reputation: 1017
library(tidyverse)
# create data set
pitcher_name <- c('AJ Griffin','AJ Griffin','AJ Burnett','AJ Burnett','Zach Godley','Zach Godley')
game_year <- c(2016, 2017, 2016, 2017, 2016, 2017)
n_of_pitches <- c(456, 550, 1001, 1760, 1500, 1800)
pitching <- data.frame(pitcher_name, game_year, n_of_pitches)
# filter for pitchers who made >= 500 pitches in both 2016 and 2017
pitching %>%
group_by(pitcher_name) %>%
filter(all(n_of_pitches >= 500))
Upvotes: 1