Bryan S
Bryan S

Reputation: 124

Rolling list over unequal times in XTS

I have stock data at the tick level and would like to create a rolling list of all ticks for the previous 10 seconds. The code below works, but takes a very long time for large amounts of data. I'd like to vectorize this process or otherwise make it faster, but I'm not coming up with anything. Any suggestions or nudges in the right direction would be appreciated.

library(quantmod)
set.seed(150)

# Create five minutes of xts example data at .1 second intervals
mins  <- 5
ticks <- mins * 60 * 10 + 1


times <- xts(runif(seq_len(ticks),1,100), order.by=seq(as.POSIXct("1973-03-17 09:00:00"),
                                                       as.POSIXct("1973-03-17 09:05:00"), length = ticks))

# Randomly remove some ticks to create unequal intervals
times <- times[runif(seq_along(times))>.3]

# Number of seconds to look back
lookback  <- 10
dist.list <- list(rep(NA, nrow(times)))

system.time(
  for (i in 1:length(times)) {

    dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
                                  strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
  }
)
>  user  system elapsed 
   6.12    0.00    5.85 

Upvotes: 2

Views: 172

Answers (1)

nograpes
nograpes

Reputation: 18323

You should check out the window function, it will make your subselection of dates a lot easier. The following code uses lapply to do the work of the for loop.

# Your code
system.time(
  for (i in 1:length(times)) {

    dist.list[[i]] <- times[paste(strptime(index(times[i])-(lookback-1), format = "%Y-%m-%d %H:%M:%S"), "/",
                                  strptime(index(times[i])-1, format = "%Y-%m-%d %H:%M:%S"), sep = "")]
  }
  )

#    user  system elapsed 
#    10.09    0.00   10.11

# My code 
system.time(dist.list<-lapply(index(times),
    function(x) window(times,start=x-lookback-1,end=x))
)
#    user  system elapsed 
#    3.02    0.00    3.03 

So, about a third faster.

But, if you really want to speed things up, and you are willing to forgo millisecond accuracy (which I think your original method implicitly does), you could just run the loop on unique date-hour-second combinations, because they will all return the same time window. This should speed things up roughly twenty or thirty times:

dat.time=unique(as.POSIXct(as.character(index(times)))) # Cheesy method to drop the ms.
system.time(dist.list.2<-lapply(dat.time,function(x) window(times,start=x-lookback-1,end=x)))

# user  system elapsed 
# 0.37    0.00    0.39 

Upvotes: 3

Related Questions