Reputation: 23898
This is the follow up of this question. Wonders how to use .SD
in this problem (rather than doing computations separately for each variable, in this case for Y1 and Y2 separately).
set.seed(12345)
A <- rep(x=paste0("A", 1:2), each=6)
B <- rep(x=paste0("B", 1:3), each=2, times=2)
Rep <- rep(x=1:2, times=3)
Y1 <- rnorm(n=12, mean = 50, sd = 5)
Y2 <- rnorm(n=12, mean = 50, sd = 10)
library(data.table)
dt <- data.table(A, B, Rep, Y1, Y2)
dt[, j = Eff1 := mean(Y1), .(A, B)][, j = Eff1 := Eff1 - mean(Y1), .(A)][, j = Eff1 := Eff1 - mean(Y1), .(B)][, j = Eff1 := Eff1 + mean(Y1)]
dt[, j = Eff2 := mean(Y2), .(A, B)][, j = Eff2 := Eff2 - mean(Y2), .(A)][, j = Eff2 := Eff2 - mean(Y2), .(B)][, j = Eff2 := Eff2 + mean(Y2)]
dt[, j = .(Eff1 = mean(Eff1), Eff2 = mean(Eff2)), by = .(A, B)]
Upvotes: 1
Views: 80
Reputation: 66819
Personally, I would consider going outside the data.table syntax, using ave
:
my_cols = c("Y1", "Y2")
tmp_cols = c("Eff1", "Eff2")
dt[, (tmp_cols) :=
lapply(.SD, function(x) mean(x) + ave(x, A, B) - ave(x, A) - ave(x, B))
, .SDcols = my_cols][,
lapply(.SD, mean)
, by=A:B, .SDcols = tmp_cols]
One long way is:
dtA = dt[, lapply(.SD, mean), by=A, .SDcols = my_cols]
dtB = dt[, lapply(.SD, mean), by=B, .SDcols = my_cols]
dtAB = dt[, lapply(.SD, mean), by=.(A,B), .SDcols = my_cols]
dt[, (tmp_cols) := lapply(.SD, mean), .SDcols = my_cols]
dt[dtAB,(tmp_cols) := Map(`+`, mget(tmp_cols), mget(paste0("i.", my_cols))), on=c("A","B")]
dt[dtA, (tmp_cols) := Map(`-`, mget(tmp_cols), mget(paste0("i.", my_cols))), on="A"]
dt[dtB, (tmp_cols) := Map(`-`, mget(tmp_cols), mget(paste0("i.", my_cols))), on="B"]
dt[, lapply(.SD, mean), by=.(A,B), .SDcols=tmp_cols]
Upvotes: 2