Reputation: 2263
I have a data.frame as follows. I would like to know the proportion of non-zero Total_dose
values grouped by percent
and is_infectious
please. Note: There aren't the same number of rows in all percent groups.
df<-data.frame(Total_dose=runif(1000),percent=rep(c("50","60","75","90"),each=250),is_infectious=sample(c(0,1),1000, replace = TRUE))
Insert some 0 values:
df$Total_dose[sample(NROW(df),200)]<-0
I'm part way there, it's just I can't seem to keep the information about the total number of rows for each group and then calculate the fraction of non-zero Total_dose. Any help would be much appreciated.
df%>%
group_by(percent,is_infectious)%>%
summarise_at(vars(Total_dose), ~sum(. != 0))
Desired output:
N Total_dose>0 | percent | is_infectious | fraction
50/56000 | 50 | 1 | 0.0008928571
45000/56000 | 50 | 0 | 0.8035714
2000/56000 | 60 | 1 |
48000/56000 | 60 | 0 |
An attempt at this based on an answer that has since disappeared and making it generic.
df%>%
rename(variable=percent) %>%
select(Total_dose,variable) %>%
group_by(variable) %>%
mutate(n=n()) %>%
collect() %>%
group_by(variable) %>%
dplyr::summarise(prop_non_zero = round(sum(Total_dose != 0)/ n *100)) %>%
group_by(variable) %>%
summarise(prop_non_zero=median(prop_non_zero)) %>%
arrange(variable)
Upvotes: 0
Views: 232
Reputation: 5429
Perhaps it doesn't have to be more complicated than this:
library(glue)
df %>%
group_by(percent, is_infectious) %>%
summarise(
fraction = sum( Total_dose != 0) / sum( df$percent %in% percent ),
`N Total_dose>0` = glue("{ sum(Total_dose !=0) }/{ sum(df$percent %in% percent ) }")
) %>% relocate( `N Total_dose>0` )
Output:
`N Total_dose>0` percent is_infectious fraction
<glue> <dbl> <dbl> <dbl>
1 110/250 50 0 0.44
2 93/250 50 1 0.372
3 86/250 60 0 0.344
4 106/250 60 1 0.424
5 101/250 75 0 0.404
6 104/250 75 1 0.416
7 111/250 90 0 0.444
8 89/250 90 1 0.356
If you object to looking up the original df
like that, you could group by percent to get group size, then group by is_infectous as well and do the rest from there:
df %>%
group_by( percent ) %>%
mutate( n = n() ) %>%
group_by(percent,is_infectious) %>%
summarise(
fraction = sum( Total_dose != 0) / first(n),
`N Total_dose>0` = glue("{ sum(Total_dose !=0) }/{ first(n) }")
) %>% relocate( `N Total_dose>0` )
Same output.
Upvotes: 1