Reputation: 124
I have stock data at the tick level and would like to create a rolling list of all ticks for the previous 10 seconds. The code below works, but takes a very long time for large amounts of data. I'd like to vectorize this process or otherwise make it faster, but I'm not coming up with anything. Any suggestions or nudges in the right direction would be appreciated.
library(quantmod)
set.seed(150)
# Create five minutes of xts example data at .1 second intervals
mins <- 5
ticks <- mins * 60 * 10 + 1
times <- xts(runif(seq_len(ticks),1,100), order.by=seq(as.POSIXct("1973-03-17 09:00:00"),
as.POSIXct("1973-03-17 09:05:00"), length = ticks))
# Randomly remove some ticks to create unequal intervals
times <- times[runif(seq_along(times))>.3]
# Number of seconds to look back
lookback <- 10
dist.list <- list(rep(NA, nrow(times)))
system.time(
for (i in 1:length(times)) {
dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
}
)
> user system elapsed
6.12 0.00 5.85
Upvotes: 2
Views: 172
Reputation: 18323
You should check out the window
function, it will make your subselection of dates a lot easier. The following code uses lapply
to do the work of the for loop.
# Your code
system.time(
for (i in 1:length(times)) {
dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
}
)
# user system elapsed
# 10.09 0.00 10.11
# My code
system.time(dist.list<-lapply(index(times),
function(x) window(times,start=x-lookback-1,end=x))
)
# user system elapsed
# 3.02 0.00 3.03
So, about a third faster.
But, if you really want to speed things up, and you are willing to forgo millisecond accuracy (which I think your original method implicitly does), you could just run the loop on unique date-hour-second combinations, because they will all return the same time window. This should speed things up roughly twenty or thirty times:
dat.time=unique(as.POSIXct(as.character(index(times)))) # Cheesy method to drop the ms.
system.time(dist.list.2<-lapply(dat.time,function(x) window(times,start=x-lookback-1,end=x)))
# user system elapsed
# 0.37 0.00 0.39
Upvotes: 3