HOSS_JFL
HOSS_JFL

Reputation: 839

aggregation of column over list of grouping variables

Assume there is a data.table with the IDs 1,2 and 3 and a VALUE column. There is also a list of permutations of these IDs that define the subsets I would like to calculate a statistic for. E.g. calculate the mean of VALUE for the three groups ID in {1 ,2}, {2 and 3}, {1 and 3}:

library(data.table)
DT <- data.table(ID = c(1,1,2,2,3,3),VALUE = c(1,2,10,20,100,200))
permutations <- list(c(1,2),c(2,3),c(1,3))

How can I generate the desired output

permutation MEAN
    1        8.25
    2        82.5
    3        75.75

My pseudocode without a loop or apply would look something like this but of course it does not work:

DT[,.(MEAN = mean(VALUE)),by = .(ID %in% permutations)]

Is it possible to do that without a loop or a call to apply? The takes very long for my table.

Upvotes: 1

Views: 34

Answers (2)

akrun
akrun

Reputation: 887118

We can do a join the 'data.table' with the 'permutations' after converting ito into a 'data.frame' and then do a group by mean

DT[stack(setNames(permutations, seq_along(permutations))), 
     on = .(ID = values)][, .(MEAN = mean(VALUE)), by = .(permutation = ind)] 
#   permutation  MEAN
#1:           1  8.25
#2:           2 82.50
#3:           3 75.75

Upvotes: 1

Sotos
Sotos

Reputation: 51592

A base R approach would be,

sapply(permutations, function(i) {x <- DT[DT$ID %in% i]; mean(x$VALUE)})
#[1]  8.25 82.50 75.75

Upvotes: 1

Related Questions