user2298382
user2298382

Reputation: 297

R: Combined Aggregation

lets say we have the following.:

time=c(20060200,20060200,20060200,20060200,20060200,20060300,20060400,20060400,20060400)
bucket=c(1,1,2,2,1,3,3,3,1)
rate=c(0.05,0.04,0.04,0.05,0.06,0.01,0.07,0.08,0.03)




       time bucket rate
1: 20060200      1 0.05
2: 20060200      1 0.04
3: 20060200      2 0.04
4: 20060200      2 0.05
5: 20060200      1 0.06
6: 20060300      3 0.01
7: 20060400      3 0.07
8: 20060400      3 0.08
9: 20060400      1 0.03

i know how to aggregate the rate to time or bucket by something like this

test=data.table(time,bucket,rate)
b=test[,list(x=sum(rate)),by=bucket]

my question is how to aggregate to the bucket, while keeping the time intact.
so what i want is something like this:

20060200  1  0.15
20060200  2  0.09
20060200  3  0
20060300  1  0
20060300  2  0
20060300  3  0.01 
20060400  1  0.03
20060400  2  0
20060400  3  0.15

hope this is clear, thanks

Upvotes: 0

Views: 149

Answers (2)

Frank
Frank

Reputation: 66819

As @Mittenchops said, you're looking for the Cartesian product. There's a function for this, CJ. You can get the combos you want with unique(CJ(time,bucket)). To use this with your data.table, you can (i) set the key and (ii) join it with the CJ:

setkey(test,time,bucket)
b <- test[unique(CJ(time,bucket)),list(x=sum(rate))]
b[is.na(x),x:=0]

The last step sets missing values to 0. The result is:

       time bucket    x
1: 20060200      1 0.15
2: 20060200      2 0.09
3: 20060200      3 0.00
4: 20060300      1 0.00
5: 20060300      2 0.00
6: 20060300      3 0.01
7: 20060400      1 0.03
8: 20060400      2 0.00
9: 20060400      3 0.15

By the way, when you "join" using x[y,...] syntax (where x and y are both data.tables), there is a hidden by...a by-without-by... on (possibly only the first part of) x's key. Look up "by-without-by" in the documentation or on google for details.

Upvotes: 5

Mittenchops
Mittenchops

Reputation: 19724

It sounds like the thing that makes your question difficult is less about aggregating, and more about creating the cartesian product of times by groups to fill in the gaps left by the aggregate. It would be great if there were a flag in the function to accomplish this, but there doesn't seem to be.

So, this isn't elegant, but here's one solution that accomplishes that, by constructing that structure, then grafting the results of aggregation onto that scaffolding:

df <- aggregate(rate~., data=test, sum)
> df
      time bucket rate
1 20060200      1 0.15
2 20060400      1 0.03
3 20060200      2 0.09
4 20060300      3 0.01
5 20060400      3 0.15

Figure out what levels we need to create our Cartesian scaffolding, in this case, all times by all groups:

> levels(factor(bucket))
[1] "1" "2" "3"
> levels(factor(time))
[1] "20060200" "20060300" "20060400"
> B <- levels(factor(bucket))
> t <- levels(factor(time))

Make a lattice base to graft the results onto:

> base <- expand.grid(B,t)
> names(base) <-c("bucket","time")
> base
  bucket     time
1      1 20060200
2      2 20060200
3      3 20060200
4      1 20060300
5      2 20060300
6      3 20060300
7      1 20060400
8      2 20060400
9      3 20060400

Merge the dataframe onto the base:

> m <- merge(base,df,all.x=T)
  bucket     time rate
1      1 20060200 0.15
2      1 20060300   NA
3      1 20060400 0.03
4      2 20060200 0.09
5      2 20060300   NA
6      2 20060400   NA
7      3 20060200   NA
8      3 20060300 0.01
9      3 20060400 0.15

Replace NA with 0s:

m$rate[is.na(m$rate)] <- 0
> m
  bucket     time rate
1      1 20060200 0.15
2      1 20060300 0.00
3      1 20060400 0.03
4      2 20060200 0.09
5      2 20060300 0.00
6      2 20060400 0.00
7      3 20060200 0.00
8      3 20060300 0.01
9      3 20060400 0.15

Sort to get your desired output:

> m[with(m,order(time,bucket)),]
  bucket     time rate
1      1 20060200 0.15
4      2 20060200 0.09
7      3 20060200 0.00
2      1 20060300 0.00
5      2 20060300 0.00
8      3 20060300 0.01
3      1 20060400 0.03
6      2 20060400 0.00
9      3 20060400 0.15

Upvotes: 0

Related Questions