sau
sau

Reputation: 1356

Sudden drop in accuracy while training a deep neural net

I am using mxnet to train a 11-class image classifier. I am observing a weird behavior training accuracy was increasing slowly and went upto 39% and in next epoch it went down to 9% and then it stays close to 9% for rest of the training. I restarted the training with saved model (with 39% training accuracy) keeping all other parameter same . Now training accuracy is increasing again. What can be the reason here ? I am not able to understand it . And its getting difficult to train the model this way as it requires me to see training accuracy values constantly.

learning rate is constant at 0.01

Upvotes: 18

Views: 27057

Answers (5)

Aditya Kane
Aditya Kane

Reputation: 411

These problems often come up. I observed that this may happen due to one of the following reasons:

  1. Something returning NaN
  2. The inputs of the network are not as expected - many modern frameworks do not raise errors in some of such cases
  3. The model layers get incompatible shapes at some point

Upvotes: 0

ChenJianfeng
ChenJianfeng

Reputation: 1

I faced the same problem.And I solved it by use (y-a)^a loss function instead of the cross-entropy function(because of log(0)).I hope there is better solution for this problem.

Upvotes: 0

Leopd
Leopd

Reputation: 42757

It is common during training of neural networks for accuracy to improve for a while and then get worse -- in general this is caused by over-fitting. It's also fairly common for the network to "get unlucky" and get knocked into a bad part of parameter space corresponding to a sudden decrease in accuracy -- sometimes it can recover from this quickly, but sometimes not.

In general, lowering your learning rate is a good approach to this kind of problem. Also, setting a learning rate schedule like FactorScheduler can help you achieve more stable convergence by lowering the learning rate every few epochs. In fact, this can sometimes cover up mistakes in picking an initial learning rate that is too high.

Upvotes: 6

Mizuki
Mizuki

Reputation: 17

It happened probably because 0log0 returns NaN.

You might avoid it by;

cross_entropy = -tf.reduce_sum(labels*tf.log(tf.clip_by_value(logits,1e-10,1.0)))

Upvotes: -2

mrphoenix13
mrphoenix13

Reputation: 729

as you can see your late accuracy is near random one. there is 2 common issue in this kind of cases.

  • your learning rate is high. try to lower it
  • The error (or entropy) you are trying to use is giving you NaN value. if you are trying to use entropies with log functions you must use them precisely.

Upvotes: 25

Related Questions