What is causing this error related to time series attributes?

Question

I am building an autoregressive distributed lag model using the dLagM package in R.

One of the steps in the pipeline is to look at how two time series to be used to build the model are correlated.

I am using the suggested code for this step below, but I am getting the error shown. It seems that the error is caused by the fact that the 2 series being looked at do not have the same attributes. When I call "attribute" on the 2 series, it seems that the attributes are the same.

Is there something I am doing wrong?

Here is my code:

install.packages(Ecdat)
library(Ecdat)
inflation <- as.numeric(Mishkin[, 1])
inflation_ts <- ts(inflation,start=c(1950,2), frequency = 12)

install.packages("Quandl")
library(Quandl)
Quandl.api_key("ci9fxB")
gdp <- Quandl("FRED/GDP")
gdp <- gdp %>% arrange(-row_number())
gdp <- gdp$Value

gdp_diff <- diff(gdp)
gdp_short <- gdp[1:(length(gdp)-1)]
gdp_change <- (gdp_diff/gdp_short)*100

GDP_mon <- c(sapply(gdp_change, function(gdp_change) c(rep(NA,2),gdp_change)))
GDP_mon <- GDP_mon[2:(length(GDP_mon)-2)]
GDP_mon <- ts(GDP_mon,start=c(1947,1), frequency=12)
GDP_mon <- na_interpolation(GDP_mon, option = "stine")
GDP_mon <- window(GDP_mon,c(1950,2),c(1990,12))

rolCorPlot(y = inflation_ts, x = GDP_mon, width = c(3, 5, 7, 9), level = 0.95,
           main = "Rolling correlations between sea levels and temperature",
           SDtest = TRUE)

attributes(inflation_ts)
attributes(GDP_mon)

John Coleman · Accepted Answer

You are observing floating point round-off error caused by initially starting at different times before windowing the longer series to match the smaller. As a simpler example (but with ranges that match yours) consider the following (which both illustrates mysteriously different time series attributes which display the same, as well as an easy fix):

x <-ts(rnorm(491), start=c(1950,2), frequency = 12)
y <-ts(rnorm(528), start=c(1947,1), frequency = 12)
y <- window(y,c(1950,2),c(1990,12))
print(attributes(x)$tsp) #prints 1950.083 1990.917   12.000
print(attributes(y)$tsp) #prints 1950.083 1990.917   12.000
#but:
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE FALSE  TRUE (!)

#the fix:

y <- ts(y,start=c(1950,2), frequency = 12)
print(attributes(x)$tsp == attributes(y)$tsp) #prints TRUE TRUE  TRUE

There is some strangeness here that I don't understand. I would have thought that as.vector(time(x)) (the times where the time series is sampled) is essentially the same as seq(a,b,1/c) (where attributes(x)$tsp = a b c) but when I compare the times of x with the sequence generated by seq I find a strange discrepancies:

> v <- as.vector(time(x))
> w <- seq(attributes(x)$tsp[1],attributes(x)$tsp[2],1/attributes(x)$tsp[3])
> sum(v == w)
[1] 412
> max(abs(v-w))
[1] 2.273737e-13
> which(v != w)
 [1] 255 258 261 264 267 270 273 276 279 282 285 288 291 294 297 300 303 306
[19] 309 312 315 318 321 324 327 330 333 336 339 342 345 348 351 354 357 360
[37] 363 366 369 372 375 378 381 384 387 390 393 396 399 402 405 408 411 414
[55] 417 420 423 426 429 432 435 438 441 444 447 450 453 456 459 462 465 468
[73] 471 474 477 480 483 486 489

The strangest thing about the above is the non-contiguous nature of the indices where the two vectors differ. The underlying problem is that 1/12 is not exactly representable by a float, so neither v nor w have the property that their points differ from successive points by exactly 1/12. My conjecture is that time series objects adopt an error-reducing strategy which causes the inevitable error to be spread-out over the time span. Since y and the original x were initially constructed with different starts, the way that this error was spread out differed slightly, in a way that wasn't fixed by window. Given the noncontigous nature of these micro-discrepancies, I suspect that sometimes code like yours which windows-down a larger time series to the same time span as a smaller will sometimes result in time series attributes which are exactly equal but other times will result in ones where 1 or 2 of the attributes differ by something like 2.273737e-13. This could lead to hard to track down bugs where code seems to work on test cases but then mysteriously crashes when the input is changed. I am surprised that the documentation on window doesn't mention the danger.

What is causing this error related to time series attributes?

Answers (1)

Related Questions