Reputation: 85
I just begin to study tensorflow and I want to create a DNN for MNIST. In the tutorial, there is a very simple neural network with 784 input nodes, 10 output nodes and no hidden nodes. I try to modify these codes to create a DNN network. Here is my code. I think I just add a hidden layer with 500 nodes between input and output layers, but the test accuracy is just 10%, which means it is not trained. Do you know what's wrong with my codes?
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
os.chdir('../')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.zeros([784,500]))
B_h1=tf.Variable(tf.zeros([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.zeros([5,5]))
B_h2=tf.Variable(tf.zeros([5]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.zeros([10]))
W_o=tf.Variable(tf.zeros([500,10]))
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
number_steps = 10000
batch_size = 100
for _ in range(number_steps):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
OK, according to @lejlot's suggestion, I change my code as following.
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
os.chdir('../')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
x=tf.placeholder(tf.float32,[None,784])
W_h1=tf.Variable(tf.random_normal([784,500]))
B_h1=tf.Variable(tf.random_normal([500]))
h1=tf.nn.relu(tf.matmul(x,W_h1)+B_h1)
'''
W_h2=tf.Variable(tf.random_normal([500,500]))
B_h2=tf.Variable(tf.random_normal([500]))
h2=tf.nn.relu(tf.matmul(h1,W_h2)+B_h2)
'''
B_o=tf.Variable(tf.random_normal([10]))
W_o=tf.Variable(tf.random_normal([500,10]))
y= tf.matmul(h1,W_o)+B_o # notice no activation
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
number_steps = 10000
batch_size = 100
for i in range(number_steps):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
train=sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
if i % 1000==0:
acc=sess.run(accuracy,feed_dict={x: mnist.test.images, y_: mnist.test.labels})
print('Current loop %d, Accuracy: %g'%(i,acc))
# Print classifier's accuracy
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
There are two modification:
change the initial value of W_h1 and B_h1 with tf.random_normal
change the define of y and cross_entropy
The modification dose work. But I still don't know what's wrong with my original code. I call the tf.global_variables_initializer().run(), and I think this function will random the value of W_h1 and B_h1. Besides, if I define y and cross_entropy as following, it doesn't work.
y= tf.nn.softmax(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),reduction_indices=[1]))
Upvotes: 0
Views: 116
Reputation: 66795
First of all this is not valid classifier model.
y=tf.nn.relu(tf.matmul(h1,W_o)+B_o)
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
You are using explicit equation for cross entropy which requires y
to be a (row-wise) probability distribution, yet you produce y
by applying relu, meaning that you are simply outputing some non-negative numbers. In fact, if you ever output zeros, your code will produce NaNs and fail (as log of 0 is minus infinity).
You should use
y = tf.nn.softmax(tf.matmul(h1,W_o)+B_o)
instead. Or even better (for better numerical stability):
y= tf.matmul(h1,W_o)+B_o # notice no activation
y_=tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(
-tf.reduce_sum(y_ * tf.nn.log_softmax(y), # notice log_softmax
reduction_indices=[1]))
Second issue is initialisation - you cannot initialise neural network weights to zeros, they have to be random numbers, typically sampled from low-variance zero-mean Gaussians. Global initialiser does not randomise weights, it simply runs all the initialisation ops - if the initialisation ops are constant ones (like zeros), it simply makes sure these zeros are assigned to variables, nothing else (thus it can be used to reset the network etc.). Zero initialisation works only for convex problems, such as logistic regression, but cannot work for complex model like neural network.
Upvotes: 2