Reputation: 1545
I'm a bit of a r newbie, and have am a little stuck at the way forward to run a correlation on time-series data where the second vector is much longer and I want to run a rolling time window.
My data looks something like this :
set.seed(1)
# "Target sample" (this is always of known fixed length N, e.g. 20 )
target <- data.frame(Date=rep(seq(Sys.Date(),by="1 day",length=20)),Measurement=rnorm(2))
# "Potential Sample" (this is always much longer and of unknown length,e.g. 730 in this example)
potential <- data.frame(Date=rep(seq(Sys.Date()-1095,by="1 day",length=730)),Measurement=rnorm(2))
What I would like to do is take a rolling window of size N (i.e matching the size of target sample), incrementing the roll by one day at a time, and then print two columns for each window :
WindowStartDate and the result of cor(target,potentialWindow)
So in pseudo-code (using the generated example above) :
N.B. The roll forward +1 day is obviously a variable that could be tweaked depending on desired overlap.
Upvotes: 0
Views: 190
Reputation: 10761
Here's a way we can do it. Note that in your original example you only specified rnorm(2)
, which worked because R
can recycle arguments, but it's probably not what you wanted. We just need to initialize a few things, and then send it through a for
loop.
It seems like we can just pull the date you want from the potential
data set, but if you want to use the Sys.Date() - X
formula, I've shown how to do that as well.
set.seed(1)
# "Target sample" (this is always of known fixed length N, e.g. 20 )
target <- data.frame(Date = rep(seq(Sys.Date(), by = "1 day", length = 20)),
Measurement = rnorm(20))
# "Potential Sample" (this is always much longer and of unknown length,e.g. 730 in this example)
potential <- data.frame(Date = rep(seq(Sys.Date() - 1095, by = "1 day", length = 730)),
Measurement = rnorm(730))
#initialize values
N <- 20
len_potential <- nrow(potential) - (N - 1)
time_start <- 1096
result.df <- data.frame(Day = potential[1,1],
Corr = numeric(len_potential),
Day2 = potential[1,1],
stringsAsFactors = FALSE
)
#use a for loop
for(i in 1:len_potential){
result.df[i,1] = as.Date(potential[i,1])
result.df[i,2] = cor(target[,2], potential[i:(i+N-1), 2])
result.df[i,3] = Sys.Date() - (time_start - i)
}
Also, as a note on posting questions to SO, sometimes it is helpful to provide desired output.
Upvotes: 1