Issues in fitting data to linear model

Question

Assuming a noiseless AR(1) process y(t)= a*y(t-1) . I have following conceptual questions and shall be glad for the clarification.

Q1 - Discrepancy between mathematical formulation and implementation - The mathematical formulation of AR model is in the form of y(t) = - summmation over i=1 to p[a*y(t-p)] + eta(t) where p=model order and eta(t) is a white gaussian noise. But when estimating coefficients using any method like arburg() or the least square, we simply call that function. I do not know if a white gaussian noise is implicitly added. Then, when we resolve the AR equation with the estimated coefficients, I have seen that the negative sign is not considered nor the noise term added.

What is the correct representation of AR model and how do I find the average coefficients over k number of trials when I have only a single sample of 1000 data points?

Q2 - Coding problem in How to simulate fitted_data for k number of trials and then find the residuals - I fitted a data "data" generated from unknown system and obtained the coefficient by

load('data.txt');

for trials = 1:10

    model = ar(data,1,'ls');
    original_data=data;

    fitted_data(i)=coeff1*data(i-1); %  **OR**
    data(i)=coeff1*data(i-1); 

    fitted_data=data;

    residual= original_data - fitted_data;
    plot(original_data,'r'); hold on; plot(fitted_data);

end

When calculating residual is the fitted_data obtained as above by resolving the AR equation with the obtained coefficients? Matlab has a function for doing this but I wanted to make my own. So, after finding coefficients from the original data how do I resolve ? The coding above is incorrect. Attached is the plot of original data and the fitted_data. Plot of original vs fitted data

Buck Thorn · Accepted Answer

AR-type models can serve a number of purposes, including linear prediction, linear predictive coding, filtering noise. The eta(t) are not something we are interested in retaining, rather part of the point of the algorithms is to remove their influence to any extent possible by looking for persistent patterns in the data.

I have textbooks that, in the context of linear prediction, do not include the negative sign included in your expression prior to the sum. On the other hand Matlab's function lpcdoes:

Xp(n) = -A(2)*X(n-1) - A(3)*X(n-2) - ... - A(N+1)*X(n-N)

I recommend you look at function lpc if you haven't already, and at the examples from the documentation such as the following:

randn('state',0);
noise = randn(50000,1);  % Normalized white Gaussian noise
x = filter(1,[1 1/2 1/3 1/4],noise);
x = x(45904:50000);
% Compute the predictor coefficients, estimated signal, prediction error, and autocorrelation sequence of the prediction error: 
p = lpc(x,3);
est_x = filter([0 -p(2:end)],1,x);    % Estimated signal
e = x - est_x;                        % Prediction error
[acs,lags] = xcorr(e,'coeff');        % ACS of prediction error

The estimated x is computed as est_x. Note how the example uses filter. Quoting the matlab doc again, filter(b,a,x) "is a "Direct Form II Transposed" implementation of the standard difference equation:

a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ... + b(nb+1)*x(n-nb)
                      - a(2)*y(n-1) - ... - a(na+1)*y(n-na)

which means that in the prior example est_x(n) is computed as

  est_x(n) = -p(2)*x(n-1) -p(3)*x(n-2) -p(4)*x(n-3)

which is what you expect!

Edit:

As regards the function ar, the matlab documentation explains that the output coefficients have the same meaning as in the lp scenario discussed above.

The right way to evaluate the output of the AR model is to compute

data_armod(i)= -coeff(2)*data(i-1) -coeff(3)*data(i-2) -coeff(4)*data(i-3)

where coeff is the coefficient matrix returned with

 model = ar(data,3,'ls');
 coeff = model.a;

Issues in fitting data to linear model

Answers (2)

Related Questions