Reputation: 2077
I wish to select ids and take average of ids by a certain condition. I'd like to measure the average of all the ids who had at least 1 time a particular factor
mdf <- data.frame (id =c(1,2,3,2,1,2),
factor= c("a","b","a","a","b" ,"a") , value = c(4,0,5,2,3,6))
id factor value
1 1 a 4
2 2 b 0
3 3 a 5
4 2 a 2
5 1 b 3
6 2 a 6
For instance, I want average of ids who had b
as factor, id 1 and 2 had b
as a factor at list one time, So' my code should extract the values of id 1 and id 2 and measure their average (4 + 0 + 2 + 3 + 6) / 5 = 3. Note that id 3 never had b
, so it is excluded from the calculation.
so my ideal output will be
factor avg
a 3.33
b 3
Upvotes: 0
Views: 115
Reputation: 887241
After converting the data.frame
to data.table
(setDT(mdf)
), we loop through the unique
elements of 'factor', grouped by 'id', if
any
of the 'factor' have that particular element, subset the data.table (.SD
), get the mean
of the 'value' and the unique
'factor' element as a 2 column 'data.table' and rbind
the list
elements.
library(data.table)
setDT(mdf)
rbindlist(lapply(unique(mdf$factor), function(x) {
x1 <- mdf[, if(any(factor==x)) .SD, id][, .(factor= x, avg=mean(value))]}))
# factor avg
#1: a 3.333333
#2: b 3.000000
Upvotes: 2
Reputation: 51592
An idea via base R,
fun1 <- function(df, f, include = TRUE){
ind <- unique(df$id[df$factor == f])
if (include == TRUE) {
return(mean(df$value[df$id %in% ind]))
} else {
return(mean(df$value[!df$id %in% ind]))
}
}
sapply(levels(mdf$factor), function(i) fun1(mdf, i))
# a b
#3.333333 3.000000
sapply(levels(mdf$factor), function(i) fun1(mdf, i, include = FALSE))
# a b
#NaN 5
Upvotes: 2