Juanchi
Juanchi

Reputation: 1166

Calculate proportion of positives values by group

With this dataframe:

table <- "
    trt rep ss  d1  d4  d5  d6  d7
    1   1   1   0   0   0   0   0
    1   1   2   0   0   0   0   0
    1   1   3   0   0   1   2   2
    1   2   1   0   0   1   3   6
    1   2   2   0   1   1   2   4
    1   2   3   0   0   0   1   1
    1   3   1   0   0   0   0   0
    1   3   2   0   0   0   0   0
    1   3   3   0   1   1   1   1
    2   1   1   0   0   0   0   0
    2   1   2   0   0   0   1   1
    2   1   3   0   0   0   1   1
    2   2   1   0   0   0   0   0
    2   2   2   0   0   0   0   0
    2   2   3   0   0   0   0   1
    2   3   1   0   0   0   0   0
    2   3   2   0   0   0   1   3
    2   3   3   .   .   .   .   .
    "
d <- read.table(text=table, header = TRUE, check.names = F, na.strings = ".")

I'd like to obtain a dataframe with the proportion of positives values by trt for every day (d1,d4,..., d7) such as this table:

# trt    d1    d4     d5    d6    d7
# 1    0.00  0.22   0.44  0.56  0.56
# 2    0.00  0.00   0.00  0.38  0.50

Upvotes: 1

Views: 293

Answers (3)

Philip
Philip

Reputation: 7293

Using data.table, something like this:

library(data.table)
d <- data.table(d)
d[,lapply(.SD,function(x) sum(x>0,na.rm=T)/sum(!is.na(x))),
  .SDcols=grep("^d",names(d),val=T),
   by=trt]

   trt d1        d4        d5        d6        d7
1:   1  0 0.2222222 0.4444444 0.5555556 0.5555556
2:   2  0 0.0000000 0.0000000 0.3750000 0.5000000

Upvotes: 4

Frank
Frank

Reputation: 66819

Thanks to @A.Webb, here's a way in base R:

aggregate(d[,4:8]>0~d$trt, FUN = mean)

#   d$trt d1        d4        d5        d6        d7
# 1     1  0 0.2222222 0.4444444 0.5555556 0.5555556
# 2     2  0 0.0000000 0.0000000 0.3750000 0.5000000

Here was my original idea:

rowsum(+(d[-(1:3)] > 0), d$trt, na.rm=TRUE) / 
  rowsum(+!is.na(d[-(1:3)]), d$trt, na.rm=TRUE)

The + is there because rowsum only works with numbers, and not with logicals.

Upvotes: 6

akrun
akrun

Reputation: 887541

We can use dplyr

library(dplyr)
d %>%
  group_by(trt) %>% 
  summarise_each( funs(round(mean(.>0, na.rm=TRUE),2)), d1:d7) 
#   trt    d1    d4    d5    d6    d7
#  (int) (dbl) (dbl) (dbl) (dbl) (dbl)
#1     1     0  0.22  0.44  0.56  0.56
#2     2     0  0.00  0.00  0.38  0.50

Upvotes: 3

Related Questions