Reputation: 634
table that contain dates, buy values and sell values. I'd like to count how many buys and sells per day and also total number of buys and sells. I'm finding this a little tricky to do in data.table.
date buy sell
2011-01-01 1 0
2011-01-02 0 0
2011-01-03 0 2
2011-01-04 3 0
2011-01-05 0 0
2011-01-06 0 0
2011-01-01 0 0
2011-01-02 0 1
2011-01-03 4 0
2011-01-04 0 0
2011-01-05 0 0
2011-01-06 0 0
2011-01-01 0 0
2011-01-02 0 8
2011-01-03 2 0
2011-01-04 0 0
2011-01-05 0 0
2011-01-06 0 5
The above data.table can be created using the following code :
DT = data.table(
date=rep(as.Date('2011-01-01')+0:5,3) ,
buy=c(1,0,0,3,0,0,0,0,4,0,0,0,0,0,2,0,0,0),
sell=c(0,0,2,0,0,0,0,1,0,0,0,0,0,8,0,0,0,5));
What I want as a result is :
date total_buys total_sells
2011-01-01 1 0
2011-01-02 0 2
and so on
Furthermore I'd also like to know the total number of buys and sells:
total_buys total_sells
4 4
I have tried :
length(DT[sell > 0 | buy > 0])
> 3
Which is a strange answer (would like to know why)
Upvotes: 3
Views: 9667
Reputation: 193507
An alternative to @Jake's answer is the typical melt
+ dcast
routine, something like:
library(reshape2)
dtL <- melt(DT, id.vars = "date")
dcast.data.table(dtL, date ~ variable, value.var = "value",
fun.aggregate = function(x) sum(x > 0))
# date buy sell
# 1 2011-01-01 1 0
# 2 2011-01-02 0 2
# 3 2011-01-03 2 1
# 4 2011-01-04 1 0
# 5 2011-01-05 0 0
# 6 2011-01-06 0 1
Or, without melting, just:
DT[, lapply(.SD, function(x) sum(x > 0)), by = date]
To get your other table, try:
dtL[, list(count = sum(value > 0)), by = variable]
# variable count
# 1: buy 4
# 2: sell 4
Or, without melting:
DT[, lapply(.SD, function(x) sum(x > 0)), .SDcols = c("buy", "sell")]
# buy sell
# 1: 4 4
Upvotes: 3
Reputation: 6535
## by date
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0)), by = date]
## date total_buys total_sells
## 1: 2011-01-01 1 0
## 2: 2011-01-02 0 2
## 3: 2011-01-03 2 1
## 4: 2011-01-04 1 0
## 5: 2011-01-05 0 0
## 6: 2011-01-06 0 1
DT[, list(total_buys = sum(buy > 0), total_sells = sum(sell > 0))]
## total_buys total_sells
## 1: 4 4
Upvotes: 10