Reputation: 11
I'm studying machine learning with 'Python Machine Learning' book written by Sebastian Raschka.
My question is about learning rate eta0 in scikit-learn Perceptron Class. The following code is implemented for Iris data classifier using Perceptron in that book.
(...omitted...)
from sklearn import datasets
from sklearn.linear_model import Perceptron
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
ml = Perceptron(eta0=0.1, n_iter=40, random_state=0)
ml.fit(X_train_std, y_train)
y_pred = ml.predict(X_test_std)
print('total test:%d, errors:%d' %(len(y_test), (y_test != y_pred).sum()))
print('accuracy: %.2f' %accuracy_score(y_test, y_pred))
My question is like the following. The result(total test, errors, accuracy) is not changed for various eta0 values.
The same result of "total test=45, errors=4, accuracy=0.91' is out with both eta0=0.1 and eta0=100. What's the wrong?
Upvotes: 1
Views: 4711
Reputation: 1998
I will try to briefly explain the position of the learning rate in the Perceptron so you understand why there is no difference between the final error magnitude and the accuracy score.
The algorithm of the Perceptron always finds a solution provided we have defined a finite number of epochs (i.e. iterations or steps), no matter how big eta0 is, because this constant simply multiplies the output weights during fitting.
The learning rate in other implementations (like neural nets and basically everything else*) is a value which is multiplied on partial derivatives of a given function during the process of reaching the optimal minima. While higher learning rates give us higher chances of overshooting the optimum, lower learning rates consume more time to converge (to reach the optimal point). The theory is complex, though, there is really good topic describing the learning rate which you should read:
http://neuralnetworksanddeeplearning.com/chap3.html
Okay, now I will also show you that the learning rate in the Perceptron is only used to rescale weights. Let us consider X as our train data and y as our train labels. Let us try to fit the Perceptron with two different eta0, say, 1.0 and 100.0:
X = [[1,2,3], [4,5,6], [1,2,3]]
y = [1, 0, 1]
clf = Perceptron(eta0=1.0, n_iter=5)
clf.fit(X, y)
clf.coef_ # returns weights assigned to the input features
array([[-5., -1., 3.]])
clf = Perceptron(eta0=100.0, n_iter=5)
clf.fit(X, y)
clf.coef_
array([[-500., -100., 300.]])
As you can see, the learning rate in the Perceptron only rescales the weights (leaving signs unchanged) of the model while leaving accuracy score and the error term constant.
Hope that suffices. E.
Upvotes: 1