Reputation: 609
My question relates to this previously asked question:
Calculating a weighted mean using data.table in R with weights in one of the table columns
In my case, I have different weights-columns across the columns I want to aggregate. Let's say I have four columns col_a, col_b, col_c
and col_d
and let's assume I want to aggregate col_a
and col_b
with weiths w_1
and col_c
, col_d
with w_2
. Example:
require(data.table)
id <- c(1,1,1,2,2,2)
col_a <- c(123,56,87,987,1003,10)
col_b <- c(17,234,20,88,765,69)
col_c <- c(45,90,543,30,1,543)
col_d <- c(60,43,700,3,88,46)
w_1 <- c(1,1,1,1,1,1)
w_2 <- c(1.5,1,1.2,0.8,1,1)
dt <- data.table(id, col_a, col_b, col_c, col_d, w_1, w_2);dt
Now the desired result would look like this:
data.table(id=c(1,2),col_a=c(weighted.mean(col_a[1:3],w_1[1:3]),weighted.mean(col_a[4:6],w_1[4:6])),col_b=c(weighted.mean(col_b[1:3],w_1[1:3]),weighted.mean(col_b[4:6],w_1[4:6])),
col_c=c(weighted.mean(col_c[1:3],w_2[1:3]),weighted.mean(col_c[4:6],w_1[4:6])),col_d=c(weighted.mean(col_d[1:3],w_2[1:3]),weighted.mean(col_d[4:6],w_2[4:6])))
This, I thought could be accomplished similar to @akrun answer to this post:
R collapse multiple rows into 1 row using specific function to each column
where I would have the two functions weighted.mean(x, w_1)
and weighted.mean(x, w_2)
instead of min
or median
.
Here is how far I got:
colsToKeep <- c("col_a","col_b","col_c","col_d")
dt[, Map(function(x,y) get(x)(y, na.rm = TRUE),
setNames(rep(c('weighted.mean', 'weighted.mean'),2),names(.SD)), .SD),.SDcols=colsToKeep, by = id]
My question: how can get the arguments w=w_1
and w=w_2
into the setNames
-function? Is that even possible?
Upvotes: 1
Views: 1224
Reputation: 320
or this one:
dt[, Map(function(x,y,w) get(x)(y, w, na.rm = TRUE),
setNames(rep('weighted.mean',length(colsToKeep)), paste0("W_", colsToKeep)),
.SD[, ..colsToKeep], .SD[, ..colsToW]),
by = id]
Upvotes: 2
Reputation: 320
Could be something like this too:
colsToKeep <- c("col_a", "col_b", "col_c", "col_d")
colsToW <- c("w_1", "w_1", "w_2", "w_2")
eval(parse(text = paste0("dt[, .(", paste0("w_", colsToKeep, " = weighted.mean(", colsToKeep, ",", colsToW, ")", collapse = ", "), "), by = id]")))
Upvotes: 2
Reputation: 25223
As mentioned by Roland, you can cast into a long format. The benefit is that in the long run, you do not have to change the codes every time when there is a new column. Explanation in line. You can print mdt
to take a look.
#cast into a long format with col values in 1 column and rows in another columns
mdt <- melt(dt, id.var=c("id",grep("^w", names(dt), value=TRUE)),
variable.name="col", value.name="colVal")
mdt <- melt(mdt, id.var=c("id","col","colVal"),
variable.name="w", value.name="wVal")
#prob need to think of a programmatic way rather than typing columns
myPairs <- data.table(rbind(
c(col="col_a", w="w_1"),
c(col="col_b", w="w_1"),
c(col="col_c", w="w_2"),
c(col="col_d", w="w_2")))
#calculate weighted mean according to myPairs and then pivot the table
dcast(mdt[myPairs, on=.(col, w),
weighted.mean(colVal, wVal),
by=.(id, col)],
id ~ col,
value.var="V1")
Upvotes: 1