La Machine Infernale
La Machine Infernale

Reputation: 569

Multiple level by aggregation data.table (R)

it might be that this is trivial, but I'm probably confused.

I have something like this:

set.seed(1234) 

dt<-data.table(day=sample(c("day1","day2","day3"),20,replace = T),
store=sample(c("store1","store2","store3"),20,replace=T), 
x=rnorm(20,33,6),y=rnorm(20,12,10))

I am interested in aggregating by day and store:

dt[,.(sumx=sum(x),sumy=sum(y)),by=c("day","store")]

    day  store      sumx     sumy
1: day1 store2  56.33890 44.52312
2: day2 store1 164.72854 61.37866
3: day3 store3 144.52483 53.74347
4: day1 store3  56.25504 34.00066
5: day3 store1  70.61311 30.85589
6: day2 store3 123.34534 74.67024
7: day2 store2  35.72952 21.19009

But also, more globally, by day only:

dt[,.(sumx=sum(x),sumy=sum(y)),by=day]

    day     sumx      sumy
1: day1 112.5939  78.52378
2: day2 323.8034 157.23899
3: day3 215.1379  84.59936

In practice, I would like to end up with a dataset that has aggregation per day and store, and an additional column with day only aggregation:

    day  store      sumx     sumy sumx_daylevel sumy_daylevel
1: day1 store2  56.33890 44.52312    112.5939    78.52378
2: day2 store1 164.72854 61.37866    323.8034    157.23899
3: day3 store3 144.52483 53.74347    215.1379    84.59936
4: day1 store3  56.25504 34.00066    112.5939    78.52378
5: day3 store1  70.61311 30.85589    215.1379    84.59936
6: day2 store3 123.34534 74.67024    323.8034    157.23899
7: day2 store2  35.72952 21.19009    323.8034    157.23899

I'd like to achieve that wrapping everything in a function, no merging. Any help will be appreciated. Thank you

Upvotes: 1

Views: 328

Answers (1)

akrun
akrun

Reputation: 887531

We can use := to create new columns

dt[,.(sumx=sum(x),sumy=sum(y)),by=c("day","store")
     ][, c("sumx_daylevel", "sumy_daylevel") := .(sum(sumx), sum(sumy)), day][]

Upvotes: 2

Related Questions