Tensorflow can't reconcile manually created model with the one using layers api

Question

I am having trouble manually recreating a model that I created using layers API. Here are the two formulations that I believe should be equal but are not working out to be somehow when I run them.

def create_mlp_net(input_images=input_images, reuse=False):
    with tf.variable_scope('mlp', reuse = reuse):
        l1 = tf.layers.dense(input_images, 512, activation=tf.nn.relu)
        l2 = tf.layers.dense(l1, 512, activation=tf.nn.relu)
        y = tf.layers.dense(l2, 10, activation=tf.nn.softmax)
    return y

def manual_create_mlp_net(input_images=input_images, reuse=False):
    with tf.variable_scope('mlp', reuse = reuse):
        W1 = tf.Variable(tf.zeros([784,512]))
        b1 = tf.Variable(tf.zeros([512]))
        l1 = tf.nn.relu(tf.matmul(input_images,W1) + b1)
        W2 = tf.Variable(tf.zeros([512,512]))
        b2 = tf.Variable(tf.zeros([512]))
        l2 = tf.nn.relu(tf.matmul(l1,W2) + b2)
        W3 = tf.Variable(tf.zeros([512,10]))
        b3 = tf.Variable(tf.zeros([10]))
        y = tf.nn.softmax(tf.matmul(l2,W3) + b3)
    return y

The first one yields an accuracy of 97% while the manual one that of 11%. I can't seem to figure out why as they should be identical. Below is the minimal working code that I am using to run this.

Correct Implementation

Based on the answer below by NPE, the initialization was the problem. The manual implementation below is the closest to using the layers API:

def manual_create_mlp_net(input_images=input_images, reuse=False):
    with tf.variable_scope('mlp', reuse = reuse):
        W1 = tf.get_variable('w1',shape=[784,512])
        b1 = tf.Variable(tf.zeros([512]))
        l1 = tf.nn.relu(tf.matmul(input_images,W1) + b1)
        W2 = tf.get_variable('w2',shape=[512,512])
        b2 = tf.Variable(tf.zeros([512]))
        l2 = tf.nn.relu(tf.matmul(l1,W2) + b2)
        W3 = tf.get_variable('w3',shape=[512,10])
        b3 = tf.Variable(tf.zeros([10]))
        y = tf.nn.softmax(tf.matmul(l2,W3) + b3)
    return y

Minimal working code

import numpy as np
import tensorflow as tf

# Load training and eval data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
input_images = tf.placeholder(tf.float32, [None, 784], name='input_images')
input_labels = tf.placeholder(tf.float32, [None, 10], name = 'input_labels')
y_api = create_mlp_net(input_images,reuse=False)
y_man = manual_create_mlp_net(input_images,reuse=False)

y_use = y_api  ## Changing this to y_man does not yield the same result

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=input_labels, logits=y_use))
train_step = tf.train.RMSPropOptimizer(0.001).minimize(cross_entropy)

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for _ in range(1000):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        sess.run(train_step, feed_dict={input_images: batch_xs, input_labels: batch_ys})
    correct_prediction = tf.equal(tf.argmax(y_use,1), tf.argmax(input_labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print(sess.run(accuracy, feed_dict={input_images: mnist.test.images, input_labels: mnist.test.labels}))

NPE · Accepted Answer

The root cause is the fact that you initialize the three weight matrices to zero.

This doesn't work because there's no asymmetry between the neurons in a layer, and therefore no way for them to learn different things.

The recommended practice for non-linear models is to use small random initial weights. This is done in tf.layers.dense by delegating to get_variable(), which in turn defaults to glorot_uniform_initializer.

Note that this is only done for the weights; the biases are initialized to zero just as is done in your code.

For a discussion of Xavier initializers, see Why should the initialization of weights and bias be chosen around 0?

Tensorflow can't reconcile manually created model with the one using layers api

Correct Implementation

Minimal working code

Answers (1)

Related Questions

Tensorflow can&#39;t reconcile manually created model with the one using layers api

Correct Implementation

Minimal working code

Answers (1)

Related Questions

Tensorflow can't reconcile manually created model with the one using layers api