Little Code
Little Code

Reputation: 1545

Correlation using rolling window on second vector

I'm a bit of a r newbie, and have am a little stuck at the way forward to run a correlation on time-series data where the second vector is much longer and I want to run a rolling time window.

My data looks something like this :

set.seed(1)
# "Target sample"  (this is always of known fixed length N, e.g. 20 )
target <- data.frame(Date=rep(seq(Sys.Date(),by="1 day",length=20)),Measurement=rnorm(2))

# "Potential Sample" (this is always much longer and of unknown length,e.g. 730 in this example)
potential <- data.frame(Date=rep(seq(Sys.Date()-1095,by="1 day",length=730)),Measurement=rnorm(2)) 

What I would like to do is take a rolling window of size N (i.e matching the size of target sample), incrementing the roll by one day at a time, and then print two columns for each window :

WindowStartDate and the result of cor(target,potentialWindow)

So in pseudo-code (using the generated example above) :

  1. Start at Sys.Date()-1095, take window size N values
  2. Print (or,probably better, put in to new data frame) Sys.Date()-1095 and result of cor(target,potentialWindow)
  3. Roll forward +1 day to Sys.Date()-1094 , take window size N values
  4. Print (or, probably better, put in to new data frame) Sys.Date()-1094 and result of cor(target,potentialWindow)
  5. etc. etc.

N.B. The roll forward +1 day is obviously a variable that could be tweaked depending on desired overlap.

Upvotes: 0

Views: 190

Answers (1)

bouncyball
bouncyball

Reputation: 10761

Here's a way we can do it. Note that in your original example you only specified rnorm(2), which worked because R can recycle arguments, but it's probably not what you wanted. We just need to initialize a few things, and then send it through a for loop.

It seems like we can just pull the date you want from the potential data set, but if you want to use the Sys.Date() - X formula, I've shown how to do that as well.

set.seed(1)
# "Target sample"  (this is always of known fixed length N, e.g. 20 )
target <- data.frame(Date = rep(seq(Sys.Date(), by = "1 day", length = 20)),
                     Measurement = rnorm(20))

# "Potential Sample" (this is always much longer and of unknown length,e.g. 730 in this example)
potential <- data.frame(Date = rep(seq(Sys.Date() - 1095, by = "1 day", length = 730)),
                        Measurement = rnorm(730)) 

#initialize values
N <- 20
len_potential <- nrow(potential) - (N - 1)
time_start <- 1096

result.df <- data.frame(Day = potential[1,1],
                        Corr = numeric(len_potential),
                        Day2 = potential[1,1],
                        stringsAsFactors = FALSE
                        )
#use a for loop
for(i in 1:len_potential){
  result.df[i,1] = as.Date(potential[i,1])
  result.df[i,2] = cor(target[,2], potential[i:(i+N-1), 2])
  result.df[i,3] = Sys.Date() - (time_start - i)
}

Also, as a note on posting questions to SO, sometimes it is helpful to provide desired output.

Upvotes: 1

Related Questions