HCAI
HCAI

Reputation: 2263

Find fraction of non-zero values by group

I have a data.frame as follows. I would like to know the proportion of non-zero Total_dose values grouped by percent and is_infectious please. Note: There aren't the same number of rows in all percent groups.

df<-data.frame(Total_dose=runif(1000),percent=rep(c("50","60","75","90"),each=250),is_infectious=sample(c(0,1),1000, replace = TRUE))

Insert some 0 values:

df$Total_dose[sample(NROW(df),200)]<-0

I'm part way there, it's just I can't seem to keep the information about the total number of rows for each group and then calculate the fraction of non-zero Total_dose. Any help would be much appreciated.

df%>%
  group_by(percent,is_infectious)%>%
  summarise_at(vars(Total_dose), ~sum(. != 0))

Desired output:

N Total_dose>0 | percent | is_infectious | fraction
50/56000 | 50 | 1 | 0.0008928571
45000/56000 | 50 | 0 | 0.8035714
2000/56000 | 60 | 1 |
48000/56000 | 60 | 0 |

An attempt at this based on an answer that has since disappeared and making it generic.

df%>%
  rename(variable=percent) %>% 
  select(Total_dose,variable) %>% 
  group_by(variable) %>% 
  mutate(n=n()) %>% 
  collect() %>% 
  group_by(variable) %>% 
  dplyr::summarise(prop_non_zero = round(sum(Total_dose != 0)/ n *100)) %>% 
  group_by(variable) %>% 
  summarise(prop_non_zero=median(prop_non_zero)) %>% 
  arrange(variable)

Upvotes: 0

Views: 232

Answers (1)

Sirius
Sirius

Reputation: 5429

Perhaps it doesn't have to be more complicated than this:

library(glue)

df %>%
    group_by(percent, is_infectious) %>%
    summarise(
        fraction = sum( Total_dose != 0) / sum( df$percent %in% percent ),
        `N Total_dose>0` = glue("{ sum(Total_dose !=0) }/{ sum(df$percent %in% percent ) }")
    ) %>% relocate( `N Total_dose>0` )


Output:


  `N Total_dose>0` percent is_infectious fraction
  <glue>             <dbl>         <dbl>      <dbl>
1 110/250               50             0      0.44 
2 93/250                50             1      0.372
3 86/250                60             0      0.344
4 106/250               60             1      0.424
5 101/250               75             0      0.404
6 104/250               75             1      0.416
7 111/250               90             0      0.444
8 89/250                90             1      0.356

If you object to looking up the original df like that, you could group by percent to get group size, then group by is_infectous as well and do the rest from there:


df %>%
    group_by( percent ) %>%
    mutate( n = n() ) %>%
    group_by(percent,is_infectious) %>%
    summarise(
        fraction = sum( Total_dose != 0) / first(n),
        `N Total_dose>0` = glue("{ sum(Total_dose !=0) }/{ first(n) }")
    ) %>% relocate( `N Total_dose>0` )

Same output.

Upvotes: 1

Related Questions