explanation on some perceptron parameters from scikit-learn

Question

I need to use the perceptron algorithm to study the learning rate and the asymptotic error of some datasets which are not linearly separable.
In order to do this, I need to understand a few parameters of the constructor. I spent plenty of hours googling them but I still can't understand quite well what they do or how to use them.
The ones that create me more problems are: alpha and eta0

I understand that every update of the algorithm is:

formula perceptron update
where (d-y(t)) just gives the desired + or -, in order to increase or decrease the component of the vector, and r is the learning rate that smooths the update.

from the scikit-learn documentation (https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html)
'alpha' is a constant that multiplies the regularization term if regularization is used.
'eta0' is a constant by which the updates are multiplied.

What is the regularization term (alpha) in the perceptron? in which part of the formula appears?
Is the eta0 the 'r' of the formula above?
Both of these parameters should slow the algorithm but make it more efficient, I would like to understand how to use them at their best.

Thank you in advance, I will appreciate any answer even if not complete.

Bahman Rouhani · Accepted Answer

first let me address this:

where (d-y(t)) just gives the desired + or -, in order to increase or decrease the component of the vector

to be more precise, (d-y(t)) is the distance between actual output and desired output. it makes sense that our correction should be in proportion to the error size (and it's obtained from a mathematical proof).

What is the regularization term (alpha) in the perceptron? in which part of the formula appears?

from scikit learn docs on Perceptron:

Perceptron is a classification algorithm which shares the same underlying implementation with SGDClassifier. In fact, Perceptron() is equivalent to SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None).

and on SGDClassifier:

The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.

so there you have it. the regularization term keeps model parameters as small as possible. this is also done in neural networks: here is a good explanation.

Is the eta0 the 'r' of the formula above?

yes, learning rate is usually denoted by eta.

explanation on some perceptron parameters from scikit-learn

Answers (1)

Related Questions