Reputation: 67
I was practicing using SGDRegressor in sklearn but I meet some problems, and I have simplified it as the following code.
import numpy as np
from sklearn.linear_model import SGDRegressor
X = np.array([0,0.5,1]).reshape((3,1))
y = np.array([0,0.5,1]).reshape((3,1))
sgd = SGDRegressor()
sgd.fit(X, y.ravel())
print("intercept=", sgd.intercept_)
print("coef=", sgd.coef_)
And this is the output:
intercept= [0.19835632]
coef= [0.18652387]
All the outputs are around intercept=0.19 and coef=0.18, but obviously the correct answer is intercept=0
and coef=1
.
Even in this simple example, the program can't get the correct solution of the parameters. I wonder where I've made a mistake.
Upvotes: 4
Views: 1368
Reputation: 23101
With n=10000 data points (draw samples with replacement from your 3 original points) you get the following results with SGD
n = 10000
X = np.random.choice([0,0.5,1], n, replace=True)
y = X
X = X.reshape((n,1))
sgd = SGDRegressor(verbose=1)
sgd.fit(X, y)
# -- Epoch 1
# Norm: 0.86, NNZs: 1, Bias: 0.076159, T: 10000, Avg. loss: 0.012120
# Total training time: 0.04 seconds.
# -- Epoch 2
# Norm: 0.96, NNZs: 1, Bias: 0.024337, T: 20000, Avg. loss: 0.000586
# Total training time: 0.04 seconds.
# -- Epoch 3
# Norm: 0.98, NNZs: 1, Bias: 0.008826, T: 30000, Avg. loss: 0.000065
# Total training time: 0.04 seconds.
# -- Epoch 4
# Norm: 0.99, NNZs: 1, Bias: 0.003617, T: 40000, Avg. loss: 0.000010
# Total training time: 0.04 seconds.
# -- Epoch 5
# Norm: 1.00, NNZs: 1, Bias: 0.001686, T: 50000, Avg. loss: 0.000002
# Total training time: 0.05 seconds.
# -- Epoch 6
# Norm: 1.00, NNZs: 1, Bias: 0.000911, T: 60000, Avg. loss: 0.000000
# Total training time: 0.05 seconds.
# -- Epoch 7
# Norm: 1.00, NNZs: 1, Bias: 0.000570, T: 70000, Avg. loss: 0.000000
# Total training time: 0.05 seconds.
# Convergence after 7 epochs took 0.05 seconds
print("intercept=", sgd.intercept_)
print("coef=", sgd.coef_)
# intercept= [0.00057032]
# coef= [0.99892893]
plt.plot(X, y, 'r.')
plt.plot(X, sgd.intercept_ + sgd.coef_*X, 'b-')
The following animation shows how SGD regressor starts converging to the correct optima as n goes up in the above code:
Upvotes: 1
Reputation: 792
SGD (Stochastic Gradient Descent) is used for large scale data. For such a trivial amount I would advise you to use a simple Linear Regression instead. As stated by the "No Free Lunch Theorem", there is not one model fits all solution and hence you should often experiment with different models to find an optimal (however you should also be aware of the background of your data such as distribution types, diversity factor, skewness, etc.). Check out the below model instead:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X,y.ravel())
lr.predict([[0],[0.5],[1]])
# output -> array([1.11022302e-16, 5.00000000e-01, 1.00000000e+00])
Upvotes: 0