Calculate the sum of the counts of a factor variable, as a subset of a dataframe in R

Question

I am trying to get a summary of how many people in my data have had surgery and then gone on to die; to calculate the mortality rate for surgery patients.

My data looks like this

df <- data.frame(
y1988 = rep(c('Y', 'Y', 'Y', 'M', 'D', 'Y', 'Y', 'D', 'X', 'D'), 25),
y1989 = rep(c('Y', 'M', 'D', 'Y', 'X', 'Y', 'X', 'Y', 'Y', 'Y'), 25),
y1990 = rep(c('D', 'Y', 'D', 'X', 'Y', 'M', 'D', 'Y', 'Y', 'Y'), 25),
y1991 = rep(c('D', 'Y', 'Y', 'M', 'D', 'Y', 'Y', 'X', 'D', 'Y'), 25),
age = rep(20:69, 5),
ID = (1:250)
)

What I want to do is get a sum of the number of 'D' and divide this by the number of 'Y' for age per year (y1988 to y1991).

If I were to do this manually, I would subset the dataframe for each age, and then divide the sum of 'D' by the sum of 'Y', eg

a21 <- filter(df, age == 21)
a21$mort1988 <- sum(a21$y1988 == 'D') / sum(a21$y1988 == 'Y')
a21$mort1989 <- sum(a21$y1989 == 'D') / sum(a21$y1989 == 'Y')

etc

This seems absurd, is there an efficient way to do this?

akrun · Accepted Answer

We can use summarise_at to do the division for each of the yYear columns after grouping by 'age'

df %>% 
    group_by(age) %>% 
    summarise_at(vars(matches("y\d{4}")), funs(sum(.=="D")/sum(.=="Y")))

Calculate the sum of the counts of a factor variable, as a subset of a dataframe in R

Answers (1)

Related Questions