Danielle
Danielle

Reputation: 795

Create variable of proportions conditional on level of factor from subsets of data using tidyverse

I have a dataframe like so:

df<- data.frame(year= as.character(c("1997", 
"1997","1997","1997","1997","1997","1998","1998")),season= 
as.character(c("W", "W","W","D","D","D","W","W")),result= 
as.character(c("Y", "Y","N","N","Y","N","N","N")))

I would like to subset the data by year and season and calculate the proportion of "Y" in result for that particular subset. This new column of proportions is called psit_freq. An example of the output is below (I have made the proportions fractions to help readers understand the calculation I need).

output<- data.frame(year= as.character(c("1997", 
"1997","1998")),season= as.character(c("W", "D","W")), psit_freq= 
 as.character(c("2/3", "1/3","0/2")))

I have tried variations of:

df<- 
 df %>%
 group_by(year, season)%>%
 summarise(psit_freq= freq())

But am not sure how to incorporate the conditional if else statement to calculate the proportion of Y responses to the total result rows in each subset.

Upvotes: 1

Views: 381

Answers (2)

austensen
austensen

Reputation: 3017

All you need to do is change result into an integer (or logical) and then group by year and season like you have and summarise taking the mean of result.


library(dplyr)

df <- tibble(
  year= c("1997", "1997","1997","1997","1997","1997","1998","1998"),
  season= c("W", "W","W","D","D","D","W","W"),
  result= c("Y", "Y","N","N","Y","N","N","N")
)

df %>% 
  mutate(result = recode(result, "Y" = 1L, "N" = 0L)) %>% 
  group_by(year, season) %>% 
  summarise(psit_freq = mean(result))

#> # A tibble: 3 x 3
#> # Groups:   year [?]
#>    year season psit_freq
#>   <chr>  <chr>     <dbl>
#> 1  1997      D 0.3333333
#> 2  1997      W 0.6666667
#> 3  1998      W 0.0000000

Upvotes: 2

Brian
Brian

Reputation: 8295

data.frame(year=as.character(c("1997","1997","1997","1997","1997","1997","1998","1998")),
           season=as.character(c("W", "W","W","D","D","D","W","W")),
           result=as.character(c("Y", "Y","N","N","Y","N","N","N"))) %>% 
  group_by(year, season) %>% 
  summarise(psit_freq = sum(result == "Y")/length(result))

Upvotes: 0

Related Questions