Reputation: 179
I am trying to nowcast a time series data (Y) using another time series (X) as a predictor. X and Y are cointegrated. Y is a monthly data from Jan 2012 to Oct 2016 and X runs from Jan 2012 to Feb 2017.
So, I ran VECM as it shown in this video: https://www.youtube.com/watch?v=x9DcUA9puY0
Than, to obtain a predicted values, I transformed it in VAR by vec2var
command, following information from this topic: https://stats.stackexchange.com/questions/223888/how-to-forecast-from-vecm-in-r
But I can not forecast Y with known X, how it can be made using predict
function with a linear regression model. Also, I can not obtain modelled Y (Y hat) values.
This is my code:
# Cointegrated_series is a ZOO object, which contains two time series X and Y
library("zoo")
library("xts")
library("urca")
library("vars")
# Obtain lag length
Lagl <- VARselect(Cointegrated_series)$selection[[1]]
#Conduct Eigen test
cointest <- ca.jo(Cointegrated_series,K=Lagl,type = "eigen", ecdet = "const",
spec = "transitory")
#Fit VECM
vecm <- cajorls(cointest)
#Transform VECM to VAR
var <- vec2var(cointest)
Than I'm trying to use predict
function in different ways: predict(var)
, predict(var, newdata = 50)
, predict(var, newdata = 1000)
- result is the same.
Tried to use tsDyn
package and newdata
argument in predict
method, as it mentioned here: https://stats.stackexchange.com/questions/261849/prediction-from-vecm-in-r-using-external-forecasts-of-regressors?rq=1
Not working. My newdata is a ZOO object, where X series has values from Nov 2016 to Feb 2017, and Y series are NAs. So, the method returns NAs in forecast:
# Cointegrated_series is a ZOO object, which contains
#two time series X and Y from Jan 2012 to Oct 2016. Both X and Y are values.
# newDat is a ZOO object, which contains two time series
#X and Y from Nov 2016 to Feb 2017. X are values, Y are NAs.
library(tsDyn)
vecm <-VECM(Cointegrated_series, lag=2)
predict(vecm,newdata = newDat, n.ahead=5)
This is a result:
Y X
59 NA NA
60 NA NA
61 NA NA
62 NA NA
63 NA NA
For example, this is what I get after calling predict
whithout newdata
argument:
predict(vecm, n.ahead=5)
Y X
59 65.05233 64.78006
60 70.54545 73.87368
61 75.65266 72.06513
62 74.76065 62.97242
63 70.03992 55.81045
So, my main questions are:
Besides that, I also couldn't find an answer on these questions:
How to call Akaike criteria (AIC) for VECM in R?
Does vars and urca packages provide F and t statistics for VECM?
UPD 10.04.2017 I slightly edited the question. Noticed, that my problem applies to a "ragged edge" problem, and it's incorrect to call it "forecasting" - it is "nowcasting".
UPD 11.04.2017
Thank you for answering!
Here is the full code:
library("lubridate")
library("zoo")
library("xts")
library("urca")
library("vars")
library("forecast")
Dat <- dget(file = "https://getfile.dokpub.com/yandex/get/https://yadi.sk/d/VJpQ75Rz3GsDKN")
NewDat <- dget(file = "https://getfile.dokpub.com/yandex/get/https://yadi.sk/d/T7qxxPUq3GsDLc")
Lagl <- VARselect(Dat)$selection[[1]]
#vars package
cointest_e <- ca.jo(Dat,K=Lagl,type = "eigen", ecdet = "const",
spec = "transitory")
vecm <- cajorls(cointest_e)
var <- vec2var(cointest_e)
Predict1 <- predict(var)
Predict2 <- predict(var, newdata = NewDat)
Predict1$fcst$Y
Predict2$fcst$Y
Predict1$fcst$Y == Predict2$fcst$Y
Predict1$fcst$X == Predict2$fcst$X
#As we see, Predict1 and Predict2 are similar, so the information in NewDat
#didn't came into account.
library("tsDyn")
vecm2 <-VECM(Dat, lag=3)
predict(vecm2)
predict(vecm2, newdata=NewDat)
If dget
will return an error, please, download my data here:
https://yadi.sk/d/VJpQ75Rz3GsDKN - for Dat
https://yadi.sk/d/T7qxxPUq3GsDLc - for NewDat
About nowcasting
Saying Nowcasting I mean current-month or previous-month forecasts of unavailible data with currently availible data. Here are some referenses:
Gianonne, Reichlin, Small: Nowcasting: The real-time informational content of macroeconomic data (2008)
Now-Casting and the Real-time Data Flow (2013)
Marcellino, Schumacher: Factor MIDAS for Nowcasting and Forecasting with Ragged-Edge Data: A Model Comparison for German GDP (2010)
Upvotes: 5
Views: 1785
Reputation: 367
First of all, thank you so much @Matifou for your awesome package. I am late in responding, but I was also struggling to figure out the same question and I did not find a solution. That is why I implemented the following function, I hope it will be useful for some people:
#' @title Special predict method for VECM models
#' @description Predict method for VECM models given some known endogenus
#' variables are known but one. It is just valid for one cointegration equation by the moment
#' @param object, an object of class ‘VECM’
#' @param new_data, a dataframe containing the forecast of all the endogenus variables
#' but one, if there are exogenus variables, its forecast must be provided.
#' @param predicted_var, a string with the desired endogenus variable to be predicted.
#' @return A list with the predicted variable, predicted values and a dataframe with the
#' detailed values used for the construction of the forecast.
#' @examples
#' data(zeroyld)
#' # Fit a VECM with Johansen MLE estimator:
#' vecm.jo<-VECM(zeroyld, lag=2, estim="ML")
#' predict.vecm(vecm.jo, new_data = data.frame("long.run" = c(7:10)), predicted_var = "short.run")
predict.vecm <- function(object, new_data, predicted_var){
if (inherits(object, "VECM")) {
# Just valid for VECM models
# Get endogenus and exogenus variables
summary_vecm <- summary(object)
model_vars <- colnames(object$model)
endovars <- sub("Equation ", "", rownames(summary_vecm$bigcoefficients))
if (!(predicted_var %in% endovars)) {
stop("You must provide a valid endogenus variable.")
}
exovars <- NULL
if (object$exogen) {
ind_endovars <-
unlist(sapply(endovars, function(x) grep(x, model_vars), simplify = FALSE))
exovars <- model_vars[-ind_endovars]
exovars <- exovars[exovars != "ECT"]
}
# First step: join new_data and (lags + 1) last values from the calibration data
new_data <- data.frame(new_data)
if (!all(colnames(new_data) %in% c(endovars, exovars))) {
stop("new_data must have valid endogenus or exogenus column names.")
}
# Endovars but the one desired to be predicted
endovars2 <- endovars[endovars != predicted_var]
# if (!all(colnames(new_data) %in% c(endovars2, exovars))) {
# stop("new_data must have valid endogenus (all but the one desired to predict) or exogenus column names.")
# }
new_data <- new_data[, c(endovars2, exovars), drop = FALSE]
new_data <- cbind(NA, new_data)
colnames(new_data) <- c(predicted_var, endovars2, exovars)
# Previous values to obtain lag values and first differences (lags + 1)
dt_tail <- data.frame(tail(object$model[, c(endovars, exovars), drop = FALSE], object$lag + 1))
new_data <- rbind(dt_tail, new_data)
# Second step: get long rung relationship forecast (ECT term)
ect_vars <- rownames(object$model.specific$beta)
if ("const" %in% ect_vars) {
new_data$const <- 1
}
ect_coeff <- object$model.specific$beta[, 1]
new_data$ECT_0 <-
apply(sweep(new_data[, ect_vars], MARGIN = 2, ect_coeff, `*`), MARGIN = 1, sum)
# Get ECT-1 (Lag 1)
new_data$ECT <- as.numeric(quantmod::Lag(new_data$ECT_0, 1))
# Third step: get differences of the endogenus and exogenus variables provided in new_data
diff_data <- apply(new_data[, c(endovars, exovars)], MARGIN = 2, diff)
colnames(diff_data) <- paste0("DIFF_", c(endovars, exovars))
diff_data <- rbind(NA, diff_data)
new_data <- cbind(new_data, diff_data)
# Fourth step: get x lags of the endogenus and exogenus variables
for (k in 1:object$lag) {
iter <- myLag(new_data[, paste0("DIFF_", endovars)], k)
colnames(iter) <- paste0("DIFF_", endovars, " -", k)
new_data <- cbind(new_data, iter)
}
# Fifth step: recursive calculatioon
vecm_vars <- colnames(summary_vecm$bigcoefficients)
if ("Intercept" %in% vecm_vars) {
new_data$Intercept <- 1
}
vecm_vars[!(vecm_vars %in% c("ECT", "Intercept"))] <-
paste0("DIFF_", vecm_vars[!(vecm_vars %in% c("ECT", "Intercept"))])
equation <- paste("Equation", predicted_var)
equation_coeff <- summary_vecm$coefficients[equation, ]
predicted_var2 <- paste0("DIFF_", predicted_var)
for (k in (object$lag + 2):nrow(new_data)) {
# Estimate y_diff
new_data[k, predicted_var2] <-
sum(sweep(new_data[k, vecm_vars], MARGIN = 2, equation_coeff, `*`))
# Estimate y_diff lags
for (j in 1:object$lag) {
new_data[, paste0(predicted_var2, " -", j)] <-
as.numeric(quantmod::Lag(new_data[, predicted_var2], j))
}
# Estimate y
new_data[k, predicted_var] <-
new_data[(k - 1), predicted_var] + new_data[k, predicted_var2]
# Estimate ECT
new_data[k, "ECT_0"] <- sum(sweep(new_data[k, ect_vars], MARGIN = 2, ect_coeff, `*`))
if (k < nrow(new_data)) {
new_data[k + 1, "ECT"] <- new_data[k, "ECT_0"]
}
}
predicted_values <- new_data[(object$lag + 2):nrow(new_data), predicted_var]
} else {
stop("You must provide a valid VECM model.")
}
return(
list(
predicted_variable = predicted_var,
predicted_values = predicted_values,
data = new_data
)
)
}
# Lag function applied to dataframes
myLag <- function(data, lag) data.frame(unclass(data[c(rep(NA, lag), 1:(nrow(data)-`lag)),]))`
@Andrey Goloborodko, in your example, you should apply:
NewDat <- NewDat[-1,] #Just new data is necessary to be provided
predict.vecm(vecm2, new_data=NewDat, predicted_var = "Y")
# $predicted_variable
# [1] "Y"
#
# $predicted_values
# [1] 65.05233 61.29563 59.45109
#
# $data
# Y X ECT_0 ECT DIFF_Y DIFF_X DIFF_Y -1 DIFF_X -1 DIFF_Y -2
# jul. 2016 92.40506 100 -29.0616718 NA NA NA NA NA NA
# ago. 2016 94.03255 78 -0.7115037 -29.0616718 1.627486 -22 NA NA NA
# sep. 2016 78.84268 53 14.4653067 -0.7115037 -15.189873 -25 1.627486 -22 NA
# oct. 2016 67.99277 52 4.8300645 14.4653067 -10.849910 -1 -15.189873 -25 1.627486
# nov. 2016 65.05233 51 3.1042967 4.8300645 -2.940435 -1 -10.849910 -1 -15.189873
# dic. 2016 61.29563 50 0.5622618 3.1042967 -3.756702 -1 -2.940435 -1 -10.849910
# ene. 2017 59.45109 55 -7.3556104 0.5622618 -1.844535 5 -3.756702 -1 -2.940435
# DIFF_X -2 DIFF_Y -3 DIFF_X -3 Intercept
# jul. 2016 NA NA NA 1
# ago. 2016 NA NA NA 1
# sep. 2016 NA NA NA 1
# oct. 2016 -22 NA NA 1
# nov. 2016 -25 1.627486 -22 1
# dic. 2016 -1 -15.189873 -25 1
# ene. 2017 -1 -10.849910 -1 1
Upvotes: 1
Reputation: 8940
I feel your question is more about how to do nowcasting for cointegrated variables, then let's see later how to implement it in R.
In general, according to Granger's representation theorem, cointegrated variables can be represented in multiple forms:
Long term relationship: contemporaneous values of y and x
VECM representation: (diff of) y and x explained by (diff of) lags, and error-correction term at previous period.
So I am not sure how you would do nowcasting in the VECM representation, since it includes only past values? I can see two possibilities:
Do nowcasting based on the long-term relationship. So you just run standard OLS, and predict from there.
Do nowcasting based on a structural VECM, where you add contemporaneous values of the variables you know (X). In R, you would do this package urca
, you need though to check whether the predict
function will allow you to add know X values.
Regarding the long-term relationship approach, what is interesting is that you can obtain forecasts for X and Y based on the VECM (without known X) and from the LT with known X. This gives you a way to have an idea of the accuracy of your model (comparing known and predicted X), which you could use to create a forecast averaging scheme for your Y?
Upvotes: 2