Reputation: 795
I have a dataframe like so:
df<- data.frame(year= as.character(c("1997",
"1997","1997","1997","1997","1997","1998","1998")),season=
as.character(c("W", "W","W","D","D","D","W","W")),result=
as.character(c("Y", "Y","N","N","Y","N","N","N")))
I would like to subset the data by year
and season
and calculate the proportion of "Y" in result
for that particular subset. This new column of proportions is called psit_freq
. An example of the output is below (I have made the proportions fractions to help readers understand the calculation I need).
output<- data.frame(year= as.character(c("1997",
"1997","1998")),season= as.character(c("W", "D","W")), psit_freq=
as.character(c("2/3", "1/3","0/2")))
I have tried variations of:
df<-
df %>%
group_by(year, season)%>%
summarise(psit_freq= freq())
But am not sure how to incorporate the conditional if else statement to calculate the proportion of Y
responses to the total result
rows in each subset.
Upvotes: 1
Views: 381
Reputation: 3017
All you need to do is change result
into an integer (or logical) and then group by year
and season
like you have and summarise taking the mean of result
.
library(dplyr)
df <- tibble(
year= c("1997", "1997","1997","1997","1997","1997","1998","1998"),
season= c("W", "W","W","D","D","D","W","W"),
result= c("Y", "Y","N","N","Y","N","N","N")
)
df %>%
mutate(result = recode(result, "Y" = 1L, "N" = 0L)) %>%
group_by(year, season) %>%
summarise(psit_freq = mean(result))
#> # A tibble: 3 x 3
#> # Groups: year [?]
#> year season psit_freq
#> <chr> <chr> <dbl>
#> 1 1997 D 0.3333333
#> 2 1997 W 0.6666667
#> 3 1998 W 0.0000000
Upvotes: 2
Reputation: 8295
data.frame(year=as.character(c("1997","1997","1997","1997","1997","1997","1998","1998")),
season=as.character(c("W", "W","W","D","D","D","W","W")),
result=as.character(c("Y", "Y","N","N","Y","N","N","N"))) %>%
group_by(year, season) %>%
summarise(psit_freq = sum(result == "Y")/length(result))
Upvotes: 0