Reputation: 17
Compute a table but with rates computed over 1999-2001. Keep only rows from 1999-2001 where players have 100 or more plate appearances, calculate each player's single rate and BB rate per season, then calculate the average single rate (mean_singles) and average BB rate (mean_bb) per player over those three seasons.
How many players had a single rate mean_singles of greater than 0.2 per plate appearance over 1999-2001?
library(tidyverse)
library(Lahman)
bat_02 <- Batting %>% filter(yearID %in% c("1999","2000","2001")) %>%
mutate(pa = AB + BB, singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>%
filter(pa >= 100) %>%
select(playerID, singles, bb)
bat_02 <- bat_02 %>% filter(singles > .2)
nrow(bat_02)
I have filtered the tables so it contain players with 100 or more plates appearance in year 1999-2001. I filtered the singles row with the condition: singles is more than 0.2. The following code gave me an output of 133, which is not correct. Is there any mistake in my code?
Upvotes: 0
Views: 783
Reputation: 1
To me the following resulted perfect:
mean_singles
of greater than 0.2 per plate appearance over 1999-2001?library(Lahman)
bat_02 <- Batting %>% filter(yearID == 2002) %>%
mutate(pa = AB + BB, singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>%
filter(pa >= 100) %>%
select(playerID, singles, bb)
bat_99_01 <- Batting %>% filter(yearID %in% 1999:2001) %>%
mutate(pa = AB + BB, singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>%
filter(pa >= 100) %>%
group_by(playerID) %>%
summarize(mean_singles = mean(singles), mean_bb = mean(bb))
sum(bat_99_01$mean_singles > 0.2)
# The result:
[1] 46
mean_bb
of greater than 0.2 per plate appearance over 1999-2001?sum(bat_99_01$mean_bb > 0.2)
# Answer:
[1] 3
Upvotes: 0
Reputation: 21
Here's the code that properly computes the required averages:
library(Lahman)
# Compute required averages for years 1999-2001
averages <- Batting %>% filter(yearID %in% c("1999","2000","2001")) %>%
mutate(pa = AB + BB, singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>%
filter(pa >= 100) %>%
group_by(playerID) %>%
summarize(mean_singles = mean(singles), mean_bb = mean(bb)) %>%
select(playerID, mean_singles, mean_bb)
# Select mean_singles and mean_bb higher than 0.2 as required by the task
averages %>% filter(mean_singles > 0.2) %>% nrow(.)
averages %>% filter(mean_bb > 0.2) %>% nrow(.)
The key here is a summarize
operation that computes averages based on the grouping by playerID (see the group_by(playerID)
section).
Upvotes: 0
Reputation: 19169
This is my take on the question.
library(Lahman)
library(dplyr)
str(Batting)
Batting %>%
#Compute a table but with rates computed over 1999-2001.
filter(yearID %in% c("1999","2000","2001")) %>%
#Keep only rows from 1999-2001 where players have 100 or more plate appearances
mutate(pa = AB + BB) %>%
filter(pa >= 100) %>%
#calculate each player's single rate and BB rate per season
group_by(playerID, yearID) %>%
summarise(singles = (H - X2B - X3B - HR)/pa, bb = BB/pa) %>%
#then calculate the average single rate (mean_singles) and average BB rate (mean_bb) per player over those three seasons.
group_by(yearID) %>%
summarise(mean_single=mean(singles), mean_bb=mean(bb))
# A tibble: 3 x 3
yearID mean_single mean_bb
<int> <dbl> <dbl>
1 1999 0.137 0.0780
2 2000 0.140 0.0765
3 2001 0.132 0.0634
Or perhaps the question wanted just the overall rates:
#then calculate the average single rate (mean_singles) and average BB rate (mean_bb) per player over those three seasons.
ungroup() %>%
summarise(mean_single=mean(singles, na.rm=TRUE), mean_bb=mean(bb, na.rm=TRUE))
# A tibble: 1 x 2
mean_single mean_bb
<dbl> <dbl>
1 0.136 0.0726
Upvotes: 3