Stephen
Stephen

Reputation: 8840

why does the Epoch reset when training SGDClassifier?

My understanding of an Epoch is that it's the number of times we have gone through the entire training set during training. But when I train SGDClassifier with verbose=true, I see the below. It just resets after 5 Epochs and begins counting at 1 again. Why would it do that?

Here is how I am instantiating the model:

clf = linear_model.SGDClassifier(loss='log', verbose=True)
clf.fit(X_train, y_train)

And here is the output:

-- Epoch 1
Norm: 5.26, NNZs: 448659, Bias: -5.164052, T: 1912007, Avg. loss: 0.005248
Total training time: 0.91 seconds.
-- Epoch 2
Norm: 5.13, NNZs: 448659, Bias: -5.286860, T: 3824014, Avg. loss: 0.004764
Total training time: 1.72 seconds.
-- Epoch 3
Norm: 5.07, NNZs: 448659, Bias: -5.353568, T: 5736021, Avg. loss: 0.004655
Total training time: 2.57 seconds.
-- Epoch 4
Norm: 5.03, NNZs: 448659, Bias: -5.398900, T: 7648028, Avg. loss: 0.004587
Total training time: 3.41 seconds.
-- Epoch 5
Norm: 5.00, NNZs: 448659, Bias: -5.432728, T: 9560035, Avg. loss: 0.004547
Total training time: 4.28 seconds.
-- Epoch 1
Norm: 5.33, NNZs: 448659, Bias: -5.161117, T: 1912007, Avg. loss: 0.009731
Total training time: 0.98 seconds.
-- Epoch 2
Norm: 5.23, NNZs: 448659, Bias: -5.276683, T: 3824014, Avg. loss: 0.009210
Total training time: 1.84 seconds.

Upvotes: 0

Views: 663

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36609

Thats because of implementing a One-vs-Rest Strategy in multi-class problems.

From the documentation:

SGDClassifier supports multi-class classification by combining multiple binary classifiers in a “one versus all” (OVA) scheme. For each of the K classes, a binary classifier is learned that discriminates between that and all other K-1 classes.

So that means if your data has 4 different classes, then 4 different instances of the model will be trained and so each model will print its number of epochs.

The default number of epochs (max_iter param) is 5. So each instance will print upto these many epochs.

In a simple binary classification task, only a single model is trained and hence the verbose output will contain single mentions of epochs only.

Hope you understand that now.

Upvotes: 4

Related Questions