Reputation: 1069
I'm new to using data tables and would like some help aggregating some data.
Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
859 04/02/2014 07:55 05/02/2014 15:37 10000 10000 2014 2 0 1
859 07/02/2014 03:16 07/02/2014 03:51 8960.755 8960.755 2014 2 0 2
859 11/02/2014 12:41 13/02/2014 11:56 13635.178 13606.901 2014 2 0 3
859 11/02/2014 13:34 11/02/2014 15:34 13635.178 13635.178 2014 2 13635.178 4
859 12/02/2014 13:46 14/02/2014 09:59 13660.246 13649.278 2014 2 13635.178 5
859 13/02/2014 15:33 13/02/2014 15:42 13606.901 13606.901 2014 2 13660.246 6
859 25/03/2014 14:52 26/03/2014 12:58 10000 10000 2014 3 0 7
For each row, I would like to aggregate all trades that opened prior to that trade and close after that trade is opened. For example, the trade in the third row opened prior to the trade in the fourth, but only closed after the fourth trade opened. So I then take the OpenedValueUSD for that trade (and any other appropriate trades (none, in this case)) and place it in the TransferredValue column.
Here is current code:
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[OpenTime <
tradeData$OpenTime & CloseTime > tradeData$OpenTime & Login ==
tradeData$Login]), by="Identifier"]
Upvotes: 3
Views: 168
Reputation: 5675
This should produce the expected result:
tradeData[,OpenTime:=as.POSIXct(OpenTime,format="%d/%m/%Y %H:%M")]
tradeData[,CloseTime:=as.POSIXct(CloseTime,format="%d/%m/%Y %H:%M")]
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[tradeData$OpenTime < OpenTime &
tradeData$CloseTime > OpenTime]), by = 'Identifier']
tradeData
# Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month
# 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2
# 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2
# 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2
# 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2
# 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2
# 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2
# 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3
# Identifier TransferredValue
# 1: 1 0.00
# 2: 2 0.00
# 3: 3 0.00
# 4: 4 13635.18
# 5: 5 13635.18
# 6: 6 13660.25
# 7: 7 0.00
Data:
tradeData <- data.table(Login = c(859, 859, 859, 859, 859, 859, 859),
OpenTime = c("04/02/2014 07:55", "07/02/2014 03:16", "11/02/2014 12:41", "11/02/2014 13:34", "12/02/2014 13:46",
"13/02/2014 15:33", "25/03/2014 14:52"),
CloseTime = c("05/02/2014 15:37", "07/02/2014 03:51", "13/02/2014 11:56", "11/02/2014 15:34", "14/02/2014 09:59",
"13/02/2014 15:42", "26/03/2014 12:58"),
OpenedValueUSD = c(10000.000, 8960.755, 13635.178, 13635.178, 13660.246, 13606.901, 10000.000),
ClosedValueUSD = c(10000.000, 8960.755, 13606.901, 13635.178, 13649.278, 13606.901, 10000.000),
Year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014),
Month = c(2, 2, 2, 2, 2, 2, 3),
Identifier = c(1, 2, 3, 4, 5, 6, 7))
Upvotes: 3
Reputation: 118889
Here's another way using foverlaps()
which doesn't require row-wise grouping. I'll call your data.table dt
.
Convert OpenTime
and CloseTime
to POSIXct format, as shown by @alex23lemm.
Add a temporary column tmpTime
which is equal to OpenTime
. We will use this in foverlaps()
.
dt[, tmpTime := OpenTime]
setkey()
on Login, OpenTime, CloseTime
colums.
setkey(dt, Login, OpenTime, CloseTime)
Using foverlaps()
, we will now get which intervals in Login, OpenTime, tmpTime
fall entirely within Login, OpenTime, CloseTime
.
olaps = foverlaps(dt, dt, by.x=c("Login", "OpenTime", "tmpTime"),
which=TRUE, nomatch=0L, type="within")
by.y
is automatically taken to be the key columns.
Remove self-overlaps, i.e., remove those where xid == yid
.
olaps = olaps[xid != yid]
# xid yid
# 1: 4 3
# 2: 5 3
# 3: 6 5
Assign to xid
rows the values corresponding to yid
. And remove tmpTime
.
dt[olaps$xid, TransferredValue :=
dt$OpenedValueUSD[olaps$yid]][, tmpTime := NULL]
# Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
# 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2 0.00 1
# 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2 0.00 2
# 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2 0.00 3
# 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2 13635.18 4
# 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2 13635.18 5
# 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2 13660.25 6
# 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3 0.00 7
Upvotes: 7