user1480926
user1480926

Reputation: 634

Grouped count aggregation in R data.table

table that contain dates, buy values and sell values. I'd like to count how many buys and sells per day and also total number of buys and sells. I'm finding this a little tricky to do in data.table.

   date   buy sell      
2011-01-01  1   0
2011-01-02  0   0
2011-01-03  0   2
2011-01-04  3   0
2011-01-05  0   0
2011-01-06  0   0
2011-01-01  0   0
2011-01-02  0   1
2011-01-03  4   0
2011-01-04  0   0
2011-01-05  0   0
2011-01-06  0   0
2011-01-01  0   0
2011-01-02  0   8
2011-01-03  2   0
2011-01-04  0   0
2011-01-05  0   0
2011-01-06  0   5

The above data.table can be created using the following code :

 DT = data.table(
          date=rep(as.Date('2011-01-01')+0:5,3) , 
          buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0),
          sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5));

What I want as a result is :

   date   total_buys   total_sells
2011-01-01    1            0
2011-01-02    0            2
                and so on  

Furthermore I'd also like to know the total number of buys and sells:

 total_buys   total_sells
     4            4

I have tried :

 length(DT[sell > 0 | buy > 0])
 > 3 

Which is a strange answer (would like to know why)

Upvotes: 3

Views: 9667

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

An alternative to @Jake's answer is the typical melt + dcast routine, something like:

library(reshape2)
dtL <- melt(DT, id.vars = "date")
dcast.data.table(dtL, date ~ variable, value.var = "value", 
                 fun.aggregate = function(x) sum(x > 0))
#         date buy sell
# 1 2011-01-01   1    0
# 2 2011-01-02   0    2
# 3 2011-01-03   2    1
# 4 2011-01-04   1    0
# 5 2011-01-05   0    0
# 6 2011-01-06   0    1

Or, without melting, just:

DT[, lapply(.SD, function(x) sum(x > 0)), by = date]

To get your other table, try:

dtL[, list(count = sum(value > 0)), by = variable]
#    variable count
# 1:      buy     4
# 2:     sell     4

Or, without melting:

DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")]
#    buy sell
# 1:   4    4

Upvotes: 3

Jake Burkhead
Jake Burkhead

Reputation: 6535

## by date
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0)), by = date]
##          date total_buys total_sells
## 1: 2011-01-01          1           0
## 2: 2011-01-02          0           2
## 3: 2011-01-03          2           1
## 4: 2011-01-04          1           0
## 5: 2011-01-05          0           0
## 6: 2011-01-06          0           1

DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0))]
##    total_buys total_sells
## 1:          4           4

Upvotes: 10

Related Questions