Jeppe Olsen
Jeppe Olsen

Reputation: 1008

Aggregate and Weighted Mean for multiple columns in R

The question is basically the samt as this: Aggregate and Weighted Mean in R.

But i want it to compute it on several columns, using data.table, as I have millions of rows. So something like this:

set.seed(42)   # fix seed so that you get the same results
dat <- data.frame(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                                    tax=rnorm(20),tax2=rnorm(20), assets=1e7+1e7*runif(20), assets2=1e6+1e7*runif(20))

DT <- data.table(dat)

I can compute the weighted mean on one column, assets, like this:

DT[,list(wret = weighted.mean(tax,assets)),by=assetclass]

But how to do it on both assets and assets2?
What if there are several columns, like col=c("assets1", "assets2", "assets3", ... )? And is it also possible to do it for tax, tax1...

Upvotes: 2

Views: 1773

Answers (1)

jogo
jogo

Reputation: 12559

So you can do it for several columns of weights

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                  tax=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=3:4]
#    assetclass      assets       asets2
# 1:          D -0.14179882 -0.003717957
# 2:          B  0.61146928  0.523913589
# 3:          E -0.28037796 -0.147677384
# 4:          C -0.09658125 -0.010338894
# 5:          A  0.74954460  0.750190947

or you can exclude the non-weight columns from .SD:

DT[, lapply(.SD, FUN=weighted.mean, x=tax), by=assetclass, .SDcols=-(1:2)]

Here is a variant using matrix multiplication:

DT[, as.list(crossprod(as.matrix(.SD), tax)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]

The matrix multiplication can do it also for several columns tax1, tax2, ...

DT <- data.table(assetclass=sample(LETTERS[1:5], 20, replace=TRUE), 
                 tax1=rnorm(20), tax2=rnorm(20), assets=1e7+1e7*runif(20), asets2=1e6+1e7*runif(20))
DT[, as.list(crossprod(as.matrix(.SD), tax1)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), tax2)/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]
DT[, as.list(crossprod(as.matrix(.SD), cbind(tax1, tax2))/colSums(.SD)), by=assetclass, .SDcols=-(1:2)]

Upvotes: 4

Related Questions