Reputation: 90
I am currently trying to teach myself TensorFlow. After thorough reading and videos, I tried to re-create to example provided at https://www.tensorflow.org/versions/r0.12/tutorials/mnist/beginners/index.html#mnist-for-ml-beginners However, to not only copy&paste, I decided to make small alterations, to actually see whether I understand what I am doing, thus I decided to work with the CIFAR-10 dataset (small 32x32 rgb images).
The code skeleton is pretty much the basic skeleton, like it is presented in the tutorial:
# Imports
import tensorflow as tf
import numpy as np
###
### Open data files (dict)
###
def unpickle(file):
import cPickle
fo = open(file, 'rb')
dict = cPickle.load(fo)
fo.close()
return dict
cifar10_test = unpickle('cifar-10-batches-py/test_batch')
cifar10_meta = unpickle('cifar-10-batches-py/batches.meta')
cifar10_batches = [unpickle('cifar-10-batches-py/data_batch_1'),
unpickle('cifar-10-batches-py/data_batch_2'),
unpickle('cifar-10-batches-py/data_batch_3'),
unpickle('cifar-10-batches-py/data_batch_4'),
unpickle('cifar-10-batches-py/data_batch_5')]
###
### Tensorflow Model
###
x = tf.placeholder("float", [None, 3072])
W = tf.Variable(tf.zeros([3072,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x,W) + b)
y_ = tf.placeholder("float", [None,10])
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
###
### Model training
###
for batch in cifar10_batches:
# Convert labels to vector with zeros, but 1 at correct position
batch['labels_vec'] = np.zeros((10000,10), dtype=float, order='C')
for i in range(10000):
batch['labels_vec'][i][batch['labels'][i]] = 1
# Train in smaller sub-batches
for i in range(3): # Breaks at first iteration, so no need to go on further
start = i*100
stop = start+100
[_, cross_entropy_py] = sess.run([train_step, cross_entropy],
feed_dict={x: batch['data'][start:stop],
y_: batch['labels_vec'][start:stop]})
print 'loss = %s' % cross_entropy_py
break # Only first batch for now
This leaves me with the output:
loss = 230.259
loss = nan
loss = nan
No error is provided by the console. I tried searching for people with the same problem, however only found different questions of scenarios which resulted in "nan" values.
The only things I changed from the online tutorial: The originally used dataset had handwritten numbers with 28x28 pixels on greyscale, thus only 784 values, instead of 3072. However, I believe this should not fundamentally change much, as I also changed the dimensions of the placeholders.
Additionally, my label values were given as a list of numbers between 0 and 9. I changed this to be zero vectors, where the correct position is indicated with a 1. e.g. if it was 3, it would be replaced with [0 0 0 1 0 0 0 0 0 0]
Some hints on where I should aim my debugging would be helpful. I had a bigger stepsize of 0.1 for the GradientDescentOptimizer, but reduced that to 0.01 (the original value as used in the tutorial) after reading that a too big stepsize may result in the loss diverging to nan.
Thank you in advance.
Upvotes: 0
Views: 946
Reputation: 554
Your loss is not numerically stable. You can use a loss that is already implemented for multiclass logistic regression instead of your loss: sigmoid_cross_entropy_with_logits. It was carefully designed to avoid numerical problems.
Upvotes: 1