Residual Estimator

Question

I have such a pipline:

attribute_est = Pipeline([
     ('jsdf', DictVectorizer()),
     ('clf', Ridge())
    ])

In there I pass data like:

{
  'Master_card' : 1,
  'Credit_Cards': 1,
  'casual_ambiance': 0,
  'Classy_People': 0
}

My model does not predict that well. Now I got proposed to:

You may find it difficult to find a single regressor that does well enough. A common solution is to use a linear model to fit the linear part of some data, and use a non-linear model to fit the residual that the linear model can't fit. Build a residual estimator that takes as an argument two other estimators. It should use the first to fit the raw data and the second to fit the residuals of the first.

What is meant with a Residual estimator? Can you provide me with an example please?

bnaecker · Accepted Answer

A residual is the error between the true data values, and those predicted by some estimator. The simplest example is in the case of linear regression, where the residuals are the distance between the best linear fit to some data and the actual data points. Least-squares fitting of a line minimizes the sum of these squared residuals.

The recommendation you were given suggests using two estimators. The first will fit the data itself. In the linear regression case, this is a least-squares linear fit, probably using something like scikit-learn's linear regression model.

The second estimator will then try to fit the residuals, i.e., the difference between the linear fit to the data and the actual data points. In the least-squares case, this is effectively detrending the data, and then fitting what is left over. You might pick this to be a Gaussian, in the case where you expect the data actually is a line with additive Gaussian noise. But if you know something about the underlying noise distribution, then use that as your second estimator.

Residual Estimator

Answers (1)

Related Questions