Reputation: 185
I have a data table like as below, and I would like to aggregate all the values within a 5s time range of another vector of timestamp.
If my description confuses you, please see the following toy example
trade_hist <-data.table(timestamp=seq(1,200,2),value=rep(1:5,20))
TS <- seq(2,200,2)
> head(trade_hist)
timestamp value
1: 1 1
2: 3 2
3: 5 3
4: 7 4
5: 9 5
6: 11 1
> head(TS)
[1] 2 4 6 8 10 12
I eventually want to create new table with timetamp as in TS and aggregate the values in original table with certain condition (say sum up all values within 5s time range prior to that time stamp). I can accomplish this with the following code:
list_of_values <- vector("list",length(TS))
for(i in 1:length(TS)){
ts <- TS[i]
tmp <- trade_hist[(ts-timestamp <= 5) & ts-timestamp > 0]
tmp <- tmp[,.(sumVal=sum(value))]
list_of_values[[i]] <- tmp
}
newcol <- rbindlist(list_of_values)
result_wanted <- cbind(TS,newcol)
> head(result_wanted)
TS sumVal
1: 2 1
2: 4 3
3: 6 6
4: 8 9
5: 10 12
6: 12 10
For example, when TS=6, look back to the original table for 5s range prior to that we have value=1,2,3 thereby the sum is 6; when TS=8, look back to the original table we have value=2,3,4, sum is 9.
However the code above is extremely inefficient for bigger table, is there some other way around? Thx!
Upvotes: 2
Views: 112
Reputation: 185
Thanks @MichaelChirico for suggesting forverlap I wrote a solution for my question using forverlap and it is really quick!
x <- data.table(start=TS,end=TS)
y <- trade_hist[,.(start=timestamp,end=timestamp+5,value)]
setkey(y, start, end)
tmp <- foverlaps(x, y, type="within")
result <- tmp[,.(value=sum(value)),by=i.start]
setnames(result,"i.start","timestamp")
Upvotes: 2
Reputation: 43334
With dplyr
,
sumVal <- sapply(TS, function(x){
sum(trade_hist %>%
filter(timestamp >= x-5, timestamp <= x) %>%
select(value))
})
result_wanted <- data.frame(TS, sumVal)
which will be considerably faster than the original. data.table
or Rcpp
could probably make it faster yet, if your data is truly huge.
Keep in mind that if TS
and trade_hist$timestamp
are actually date-time objects, they need to be in a format that can handle subtraction and greater than/less than operators, or those operations need to be changed to a suitable one. lubridate
or chron
probably has what you need if the base POSIXlt
/POSIXct
isn't cutting it.
Upvotes: 1