Dave.Cheng
Dave.Cheng

Reputation: 57

Tensorflow: loss becomes 'NaN'

I was doing CIFAR-10 training on CPU with Tensorflow. During the first few rounds, the loss seemed alright. But after the step 10210 the loss varies and ends up becoming NaN.

My network model the CIFAR-10 CNN model from their website. Here is my setting,

image_size = 32
num_channels = 3
num_classes = 10
num_batches_to_run = 50000
batch_size = 128
eval_batch_size = 64
initial_learning_rate = 0.1
learning_rate_decay_factor = 0.1
num_epochs_per_decay = 350.0
moving_average_decay = 0.9999

and the result is shown as below.

2017-05-12 21:53:05.125242: step 10210, loss = 4.99 (124.9 examples/sec; 1.025 sec/batch)
2017-05-12 21:53:13.960001: step 10220, loss = 7.55 (139.5 examples/sec; 0.918 sec/batch)
2017-05-12 21:53:23.491228: step 10230, loss = 6.63 (149.5 examples/sec; 0.856 sec/batch)
2017-05-12 21:53:33.355805: step 10240, loss = 8.08 (113.3 examples/sec; 1.129 sec/batch)
2017-05-12 21:53:43.007007: step 10250, loss = 7.18 (126.7 examples/sec; 1.010 sec/batch)
2017-05-12 21:53:52.650118: step 10260, loss = 16.61 (138.0 examples/sec; 0.928 sec/batch)
2017-05-12 21:54:02.537279: step 10270, loss = 9.60 (137.6 examples/sec; 0.930 sec/batch)
2017-05-12 21:54:12.390117: step 10280, loss = 46526.25 (145.5 examples/sec; 0.880 sec/batch)
2017-05-12 21:54:22.060741: step 10290, loss = 133479743509972411931057146822656.00 (130.4 examples/sec; 0.982 sec/batch)
2017-05-12 21:54:31.691058: step 10300, loss = nan (115.8 examples/sec; 1.105 sec/batch)

Any idea about the NaN loss?

Upvotes: 2

Views: 3819

Answers (3)

Martin Thoma
Martin Thoma

Reputation: 136695

You might have the cross entropy loss and take log(0). Just add a small constant within the log.

(you might also want to look into gradient clipping)

Upvotes: 0

Simba
Simba

Reputation: 1651

This happens a lot in practice when your learning rate is too high, I tend to start at 0.001 and move from there, 0.1 is on the very high side on most datasets, especially if you aren't dividing your loss by your batch size.

Upvotes: 7

Shabaz Patel
Shabaz Patel

Reputation: 291

You can clip the gradients, if you are using Keras with Tensorflow backend, you could do as follows,

The parameters clipnorm and clipvalue can be used with all optimizers to control gradient clipping:

 from keras import optimizers

 # All parameter gradients will be clipped to
 # a maximum norm of 1.
 sgd = optimizers.SGD(lr=0.01, clipnorm=1.)

or

 from keras import optimizers
 # All parameter gradients will be clipped to
 # a maximum value of 0.5 and
 # a minimum value of -0.5.
 sgd = optimizers.SGD(lr=0.01, clipvalue=0.5)

Upvotes: 1

Related Questions