Issue with optimizing parameters in basic multi-layer perceptron

Question

I've just recently been getting into Tensorflow but I've been having some trouble expanding from a simple one layer neural network to a multilayer one. I've pasted the code below from my attempt, any help with why it's not working would be greatly appreciated!

import tensorflow as tf
from tqdm import trange
from tensorflow.examples.tutorials.mnist import input_data

# Import data
mnist = input_data.read_data_sets("datasets/MNIST_data/", one_hot=True)

x = tf.placeholder(tf.float32, [None, 784])
W0 = tf.Variable(tf.zeros([784, 500]))
b0 = tf.Variable(tf.zeros([500]))
y0 = tf.matmul(x, W0) + b0
relu0 = tf.nn.relu(y0)
W1 = tf.Variable(tf.zeros([500, 100]))
b1= tf.Variable(tf.zeros([100]))
y1 = tf.matmul(relu0, W1) + b1
relu1 = tf.nn.relu(y1)
W2 = tf.Variable(tf.zeros([100, 10]))
b2= tf.Variable(tf.zeros([10]))
y2 = tf.matmul(relu1, W2) + b2
y = y2


# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy =       tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Create a Session object, initialize all variables
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Train
for _ in trange(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)    
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={x: 
mnist.test.images, y_: mnist.test.labels})))

sess.close()

PS: I know that this code can be achieved much more easily by using Keras or even the prebuilt Tensorflow layers, but I am trying to get a more basic understanding on the math behind the library. Thanks!

MadLordDev · Accepted Answer

You have 2 things to take into consideration.

1) tf.Variable(tf.zeros([784, 500])) change this with tf.Variable(tf.random_normal([784, 500])) As it is better to have random initialization of weights rather than defining them as 0 s from the start. By making it 0 initially (meaning everything gets the same value) model will follow the same gradient path and will be unable to learn. For the start change every zeros with random_normal. There are better ways to firstly define variables but this will give you a good start

2) your learning rate is too high train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) change this line to

train_step = tf.train.GradientDescentOptimizer(0.005).minimize(cross_entropy)

Issue with optimizing parameters in basic multi-layer perceptron

Answers (1)

Related Questions