R: Combined Aggregation

Question

lets say we have the following.:

time=c(20060200,20060200,20060200,20060200,20060200,20060300,20060400,20060400,20060400)
bucket=c(1,1,2,2,1,3,3,3,1)
rate=c(0.05,0.04,0.04,0.05,0.06,0.01,0.07,0.08,0.03)




       time bucket rate
1: 20060200      1 0.05
2: 20060200      1 0.04
3: 20060200      2 0.04
4: 20060200      2 0.05
5: 20060200      1 0.06
6: 20060300      3 0.01
7: 20060400      3 0.07
8: 20060400      3 0.08
9: 20060400      1 0.03

i know how to aggregate the rate to time or bucket by something like this

test=data.table(time,bucket,rate)
b=test[,list(x=sum(rate)),by=bucket]

my question is how to aggregate to the bucket, while keeping the time intact.
so what i want is something like this:

20060200  1  0.15
20060200  2  0.09
20060200  3  0
20060300  1  0
20060300  2  0
20060300  3  0.01 
20060400  1  0.03
20060400  2  0
20060400  3  0.15

hope this is clear, thanks

Frank · Accepted Answer

As @Mittenchops said, you're looking for the Cartesian product. There's a function for this, CJ. You can get the combos you want with unique(CJ(time,bucket)). To use this with your data.table, you can (i) set the key and (ii) join it with the CJ:

setkey(test,time,bucket)
b <- test[unique(CJ(time,bucket)),list(x=sum(rate))]
b[is.na(x),x:=0]

The last step sets missing values to 0. The result is:

       time bucket    x
1: 20060200      1 0.15
2: 20060200      2 0.09
3: 20060200      3 0.00
4: 20060300      1 0.00
5: 20060300      2 0.00
6: 20060300      3 0.01
7: 20060400      1 0.03
8: 20060400      2 0.00
9: 20060400      3 0.15

By the way, when you "join" using x[y,...] syntax (where x and y are both data.tables), there is a hidden by...a by-without-by... on (possibly only the first part of) x's key. Look up "by-without-by" in the documentation or on google for details.

R: Combined Aggregation

Answers (2)

Related Questions