MFR
MFR

Reputation: 2077

select id with a certain condition and measure average of the values of these ids in R

I wish to select ids and take average of ids by a certain condition. I'd like to measure the average of all the ids who had at least 1 time a particular factor

 mdf <- data.frame (id =c(1,2,3,2,1,2), 
factor= c("a","b","a","a","b" ,"a") , value = c(4,0,5,2,3,6))



   id factor value
1  1      a     4
2  2      b     0
3  3      a     5
4  2      a     2
5  1      b     3
6  2      a     6

For instance, I want average of ids who had b as factor, id 1 and 2 had b as a factor at list one time, So' my code should extract the values of id 1 and id 2 and measure their average (4 + 0 + 2 + 3 + 6) / 5 = 3. Note that id 3 never had b, so it is excluded from the calculation. so my ideal output will be

factor  avg
a      3.33
b      3

Upvotes: 0

Views: 115

Answers (2)

akrun
akrun

Reputation: 887241

After converting the data.frame to data.table (setDT(mdf)), we loop through the unique elements of 'factor', grouped by 'id', if any of the 'factor' have that particular element, subset the data.table (.SD), get the mean of the 'value' and the unique 'factor' element as a 2 column 'data.table' and rbind the list elements.

library(data.table)
setDT(mdf)
rbindlist(lapply(unique(mdf$factor), function(x) {
       x1 <- mdf[, if(any(factor==x)) .SD, id][, .(factor= x, avg=mean(value))]}))
#   factor      avg
#1:      a 3.333333
#2:      b 3.000000

Upvotes: 2

Sotos
Sotos

Reputation: 51592

An idea via base R,

fun1 <- function(df, f, include = TRUE){ 
  ind <- unique(df$id[df$factor == f])
  if (include == TRUE) {
  return(mean(df$value[df$id %in% ind]))
  } else {
  return(mean(df$value[!df$id %in% ind])) 
  }
}

sapply(levels(mdf$factor), function(i) fun1(mdf, i))
#       a        b 
#3.333333 3.000000 

sapply(levels(mdf$factor), function(i) fun1(mdf, i, include = FALSE))
#  a   b 
#NaN   5 

Upvotes: 2

Related Questions