Python Scikit - LinearRegression and Ridge return different results

Question

I have a small data set with 47 samples. I'm running linear regression with 2 features.

After running LinearRegression I ran Ridge (with sag). I would expect it to converge quickly, and return exactly the same prediction as computed solving the normal equations.

But every time I run Ridge I get a different result, close to the result provided by LinearRegression but not exactly the same. It doesn't matter how many iterations I run. Is this expected? Why? In the past I've implemented regular gradient descent myself and it quickly converges in this data set.

ols = sklearn.linear_model.LinearRegression()
model = ols.fit(x_train, y_train)
print(model.predict([[1650,3]]))
 %[[ 293081.4643349]]

scaler=preprocessing.StandardScaler().fit(x_train)
ols = sklearn.linear_model.Ridge(alpha=0,solver="sag",max_iter=99999999,normalize=False)
model = ols.fit(x_scaled, y_train)
x_test=scaler.transform([[1650,3]])
print(model.predict(x_test))
 %[[ 293057.69986594]]

Fackelmann · Accepted Answer

Thank you all for your answers! After reading @sascha response I read a little bit more on Stochastic Average Gradient Descent and I think I've found the reason of this discrepancy and it seems in fact that it's due to the "stochastic" part of the algorithm.

Please check the wikipedia page: https://en.wikipedia.org/wiki/Stochastic_gradient_descent

In regular gradient descent we update the weights on every iteration based on this formula: $gradient descent$

Where the second term of the sum is the gradient of the cost function multiplied by a learning rate mu.

This is repeated until convergence, and it always gives the same result after the same number of iterations, given the same starting weights.

In Stochastic Gradient Descent this is done instead in every iteration:

$stochastic gradient descent$

Where the second part of the sum is the gradient in a single sample (multiplied by the learning rate mu). All the samples are randomized at the beginning, and then the algorithm cycles through them at every iteration.

So I think a couple of things contribute to the behavior I asked about:

(EDITED see replies below)

The point used to calculate the gradient at every iteration changes every time I re-run the fit function. That's why I don't obtain the same result every time.

(EDIT)(This can be made deterministic by using random_state when calling the fit method)

I also realized that the number of iterations that the algorithm runs varies between 10 and 15 (regardless of the max_limit I set). I couldn't find anywhere what the criteria for convergence is in scikit, but my guess is that if I could tighten it (i.e. run for more iterations) the answer I would get would be much closer to the LinearRegression method.

(EDIT)(Convergence criteria depends on tol (precision of the solution). By modifying this parameter (I set it to 1e-100) I was able to obtain the same solution as the one reported by LinearRegression)

Python Scikit - LinearRegression and Ridge return different results

Answers (2)

Related Questions