Reputation: 61
I have made a convolutional neural network model using tensorflow to recognize handwriting by referring to tensorflow tutorials[1].This model uses convolutional filter1:[5,5,1,16], filter2:[5,5,16,32], fully combined layers[7*7*32,1024], and [1024,10] and then uses softmax to covert it to probabilities. I runs this model and failed because "loss" did't decrease ever and all of outputs are [0,0,1,0,0,0,0,0,0,0,0].
Then, I reduced the number of the filters and neurons and it succeeded and the accuracy marked about 97%.
Why can't I train successfully when I make a model in the same number of filters and neurons?
Here is my failed model.(I used "mnist.csv")
x = tf.placeholder(tf.float32,[None,28*28])
t = tf.placeholder(tf.float32,[None,10])
def weight(shape):
init = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(init)
def bias(shape):
init = tf.constant(0.1, shape=shape)
return tf.Variable(init)
def conv2d(x,W):
return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding="SAME")
def max_pool_22(x):
return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding="SAME")
W_conv1 = weight([5,5,1,16])
b_conv1 = bias([16])
x_image = tf.reshape(x,[-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_22(h_conv1)
print(h_pool1.shape)
W_conv2 = weight([5,5,16,64])
b_conv2 = bias([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1,W_conv2) + b_conv2)
h_pool2 = max_pool_22(h_conv2)
W_fc1 = weight([7*7*64,1024])
b_fc1 = bias([1024])
h_pool2_flat = tf.reshape(h_pool2,[-1,7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1) + b_fc1)
W_fc2 = weight([1024,10])
b_fc2 = bias([10])
prediction = tf.nn.softmax(tf.matmul(h_fc1,W_fc2) + b_fc2)
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=t,logits=prediction))
train_step = tf.train.AdamOptimizer().minimize(cross_entropy)
correct_prediction =tf.equal(tf.argmax(prediction,1),tf.argmax(t,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for epoch in range(20):
avg_loss = 0.
avg_accuracy = 0.
for i in range(1000):
ind = np.random.choice(len(x_train),50)
x_train_batch = x_train[ind]
t_train_batch = t_train[ind]
_, loss, a = sess.run([train_step,cross_entropy, accuracy],feed_dict={x:x_train_batch,t:t_train_batch})
avg_loss += loss/1000
avg_accuracy += a/1000
if epoch % 1 == 0:
print("Step:{0} Loss:{1} TrainAccuracy:{2}".format(epoch,avg_loss,avg_accuracy))
print("test_accuracy:{0}".format(accuracy.eval(feed_dict={x:x_test,t:t_test})))
[1]: https://www.tensorflow.org/get_started/mnist/prosenter code here
Upvotes: 1
Views: 342
Reputation: 110069
You are calling softmax_cross_entropy_with_logits
on the output of softmax
. This applies softmax twice leading to wrong results. softmax_cross_entropy_with_logits
should be called on the linear output of the last layer, before applying softmax
:
y = tf.matmul(h_fc1,W_fc2) + b_fc2
cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=t, logits=y))
prediction_probabilities = tf.nn.softmax(y)
prediction_class = tf.argmax(y, 1)
The prediction_probabilities
tensor above is only needed if you need the probabilities of each class. Otherwise, you can call argmax
on y
directly to get the predicted class.
Upvotes: 1