How to create multiple new columns with data.table's j over different subsets

Question

I want to create multiple variables that are aggregating various subsets of a dataset. For an illustrating example, say you have the following data:

DT = data.table(Group1 = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4), 
                Group2 = c(1,1,1,2,2,1,1,2,2,2,1,1,1,1,2,1,1,2,2,2), 
                  Var1 = c(1,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0))

I want to find several averages of variable Var1. I want to know:

mean(Var1) grouped by Group1
mean(Var1) for only those with Group2 == 1, grouped by Group1
mean(Var1) for only those with Group2 == 2, grouped by Group1

Or, in data.table parlance,

DT[, mean(Var1), by=Group1]
DT[Group2==1, mean(Var1), by=Group1]
DT[Group2==2, mean(Var1), by=Group1]

Obviously, calculating any one of these is very straightforward. But I can't find a good way to calculate all three of them, since they use different subsets in i. The solution I've been using so far is generating them individually, then merging them into a unified table.

DT_all <- DT[, .(avgVar1_all = mean(Var1)), by = Group1]
DT_1 <- DT[Group2 == 1, .(avgVar1_1 = mean(Var1)), by = Group1]
DT_2 <- DT[Group2 == 2, .(avgVar1_2 = mean(Var1)), by = Group1]
group_info <- merge(DT_all, DT_1, by = "Group1")
group_info <- merge(group_info, DT_2, by = "Group1")

group_info
#    Group1 avgVar1_all avgVar1_1 avgVar1_2
# 1:      1         0.4 0.6666667 0.0000000
# 2:      2         0.6 1.0000000 0.3333333
# 3:      3         0.2 0.2500000 0.0000000
# 4:      4         0.0 0.0000000 0.0000000

Is there a more elegant method I could be using?

thelatemail · Accepted Answer

Just do it all in one grouping operation using .SD:

DT[, .(
        all  = mean(Var1),
        grp1 = .SD[Group2==1, mean(Var1)],
        grp2 = .SD[Group2==2, mean(Var1)]
      ),
  by = Group1,
  .SDcols=c("Group2","Var1")
  ]

#   Group1 all      grp1      grp2
#1:      1 0.4 0.6666667 0.0000000
#2:      2 0.6 1.0000000 0.3333333
#3:      3 0.2 0.2500000 0.0000000
#4:      4 0.0 0.0000000 0.0000000

How to create multiple new columns with data.table's j over different subsets

Answers (2)

Related Questions

How to create multiple new columns with data.table&#39;s j over different subsets

Answers (2)

Related Questions

How to create multiple new columns with data.table's j over different subsets