Fitting AR model to data in Matlab

Question

I have some historical data RV that I want to fit a model to. The model is:

RV(t+1) = C0 + C1*RV(t) + C2*RV_weekAverage(t-5) + C3*RV_monthAverage(t-30) + e

where the future RV depends on the previous RV and averages of some previous RV values.

t - is time

C0, C1, C2, C3 - are parameters to be determined

RV_weekAverage = (1/5)*(sum of RVs from t-1 to t-5)

RV_monthAverage = (1/30)*(sum of RVs from t-1 to t-30)

e - error

I think I am supposed to use an AR model but I am not exactly sure how to implement it since for an AR model the right-hand side don't have averages but simple previous values such as:

RV(t+1) = C0 + C1*RV(t) + C2*RV(t-1) + e

To use the AR model, I would have to try:

RV(t+1) = C0 + C1*RV(t) + C2*(1/5)*[RV(t-1)+...+RV(t-5)] + C3*(1/30)*[RV(t-1)+...+RV(t-30)] + e

I am not sure how to include the factor of (1/5) or (1/30) into the model without interfering with the parameters C0, C1, C2, and C3 when I try to estimate them. This is all I have so far:

model = arima(6,0,0)
fit = estimate(model,RV)

Colin T Bowers · Accepted Answer

First things first: What you have here does not appear to be a programming problem but rather an econometrics problem. Because of this, it is perhaps better suited to Cross Validated. If this is the case, then a moderator may choose to migrate your question (and this answer) over there.

Having said all that, I thought I might still provide an answer here.

You appear to be dealing with a time series regression involving - if my acronym guessing skills are correct - realized volatility, or realized variance. Your regressand is RV_{t+1}, and regressors are RV_{t}, a linear combination of RV_{t-1} to RV_{t-5} and a linear combination of RV_{t-1} to RV_{t-30}.

Given that your set of regressands contains lags of your regressor, I'm guessing someone has told you to look at AR(p) models, and that is where the problems started :-) Sure, that is one way to go about this problem, but personally I think it is the wrong way. Why? Because you have lags up to t-30, so you're looking at an AR(30) model, but as you state clearly in the question, your model only contains 4 parameters. Why have over 30 regressors when you only need to estimate 4 parameters? It implies your estimation methodology will need to accommodate constraints on the parameters which is just going to make life hard for you.

Fortunately, in your case, it can be avoided. In fact, in my opinion, you have already written down the appropriate form in the question!

I'm going to simplify the notation a bit: Let y_{t+1} = RV_{t+1}, X1_t = RV_t, X2_t = (1/5)(RV_{t-1} + ... + RV_{t-5}), and X3_t = (1/30)(RV_{t-1} + ... + RV_{t-30}). Now we can write the regression equation as:

y_{t+1} = c0 + c1*X1_t + c2*X2_t + c3*X3_t + e_t

This is a straightforward time series regression with lags. Forget about AR(p) specific estimation methods, you can just do simple, reliable OLS on this. If the residuals pass a Durbin-Watson test, then it is likely that the OLS estimator will be consistent, and, given a few extra assumptions, the Best Linear Unbiased Estimator (BLUE).

Here is some example code to get you started:

%# Randomly generate some observations
T = 1000;
RV = randn(T, 1);

%# Construct your variables
y = RV(32:end); %# your regressor
X1 = RV(31:end-1); %# first lag of your regressor (ie your first regressand)
X2 = conv(RV(26:end-2), (1/5) * ones(5, 1), 'valid'); %# moving window average over 5 observations (ie your second regressand)
X3 = conv(RV(1:end-2), (1/30) * ones(30, 1), 'valid'); %# moving window average over 30 observations (ie your third regressand)

%# Build your matrix of regressors (including a vector of ones for the constant term)
X = [ones(length(X1), 1), X1, X2, X3];

%# Perform OLS
[Coef, CoefConfInt, e] = regress(y, X);

%# Perform a durbin watson test on the residuals
[DWpVal, DWStat] = dwtest(e, X);
if DWpVal < 0.05; fprintf('WARNING: residuals from regression appear to be serially correlated. Estimated coefficients may not be consistent'); end

Coef stores your estimated coefficients, and CoefConfInt stores confidence intervals for those estimators. I've even incorporated a test to check that your residuals pass the Durbin-Watson test. Obviously you'll need to substitute your actual RV for my randomly generated RV. If in your actual regression the residuals are not passing the Durbin-Watson test, then you may need to looking into methods such as Feasible GLS or else have a read of the time-series chapters of Greene's "Econometric Analysis" - but hopefully it won't come to that.

If you think this response answers the question, then feel free to click the tick mark next to it.

Fitting AR model to data in Matlab

Answers (1)

Related Questions