yoshi
yoshi

Reputation: 61

My tensorflow model fails training when I increase number of neurons or layers

I have made a convolutional neural network model using tensorflow to recognize handwriting by referring to tensorflow tutorials[1].This model uses convolutional filter1:[5,5,1,16], filter2:[5,5,16,32], fully combined layers[7*7*32,1024], and [1024,10] and then uses softmax to covert it to probabilities. I runs this model and failed because "loss" did't decrease ever and all of outputs are [0,0,1,0,0,0,0,0,0,0,0].

Then, I reduced the number of the filters and neurons and it succeeded and the accuracy marked about 97%.

Why can't I train successfully when I make a model in the same number of filters and neurons?

Here is my failed model.(I used "mnist.csv")

x = tf.placeholder(tf.float32,[None,28*28])
t = tf.placeholder(tf.float32,[None,10])
def weight(shape):
   init = tf.truncated_normal(shape, stddev=0.1)
   return tf.Variable(init)
def bias(shape):
   init = tf.constant(0.1, shape=shape)
   return tf.Variable(init)

def conv2d(x,W):
   return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding="SAME")
def max_pool_22(x):
   return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding="SAME")

W_conv1 = weight([5,5,1,16])
b_conv1 = bias([16])

x_image = tf.reshape(x,[-1,28,28,1])


h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

h_pool1 = max_pool_22(h_conv1)
print(h_pool1.shape)

W_conv2 = weight([5,5,16,64])
b_conv2 = bias([64])

h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)
h_pool2 = max_pool_22(h_conv2)
W_fc1 = weight([7*7*64,1024])
b_fc1 = bias([1024])

h_pool2_flat = tf.reshape(h_pool2,[-1,7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1) + b_fc1)

W_fc2 = weight([1024,10])
b_fc2 = bias([10])

prediction = tf.nn.softmax(tf.matmul(h_fc1,W_fc2) + b_fc2) 
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=t,logits=prediction))
train_step = tf.train.AdamOptimizer().minimize(cross_entropy)

correct_prediction =tf.equal(tf.argmax(prediction,1),tf.argmax(t,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for epoch in range(20):
   avg_loss = 0.
   avg_accuracy = 0.
   for i in range(1000):
       ind = np.random.choice(len(x_train),50)
       x_train_batch = x_train[ind]
       t_train_batch = t_train[ind]
       _, loss, a = sess.run([train_step,cross_entropy, accuracy],feed_dict={x:x_train_batch,t:t_train_batch})
       avg_loss += loss/1000
       avg_accuracy += a/1000
   if epoch % 1 == 0:
      print("Step:{0} Loss:{1} TrainAccuracy:{2}".format(epoch,avg_loss,avg_accuracy))

print("test_accuracy:{0}".format(accuracy.eval(feed_dict={x:x_test,t:t_test})))

[1]: https://www.tensorflow.org/get_started/mnist/prosenter code here

Upvotes: 1

Views: 342

Answers (1)

interjay
interjay

Reputation: 110069

You are calling softmax_cross_entropy_with_logits on the output of softmax. This applies softmax twice leading to wrong results. softmax_cross_entropy_with_logits should be called on the linear output of the last layer, before applying softmax:

y = tf.matmul(h_fc1,W_fc2) + b_fc2
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=t, logits=y))

prediction_probabilities = tf.nn.softmax(y)
prediction_class = tf.argmax(y, 1)

The prediction_probabilities tensor above is only needed if you need the probabilities of each class. Otherwise, you can call argmax on y directly to get the predicted class.

Upvotes: 1

Related Questions