Bill
Bill

Reputation: 663

Need help understanding tf.contrib.layers.fully_connected trained weights

I'm relatively new to neural nets and tensorflow in general. In a course I'm taking, we constructed a cnn. In simplistic terms, I think I can convey what is happening with some stripped down statements:

In the first tf.session(), the network is trained and parameters are stored to a python dictionary. I monitor the cost and test/train accuracy during the training in Tensorboard, and I get very reasonable results.

def forward_propagation(X, parameters):
    ....
    model layers...
    ....
    z_out = tf.contrib.layers.fully_connected(P2, 6, activation_fn=None)

    return z_out

def model(X_train, Y_train,...):
    ....
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer()
        ....
        loop thru epochs and mini-batches...
        optimize parameters...
        parameters = sess.run(parameters)
    ....

    return parameters # python dict

Then in the second tf session, predictions are made using the trained parameters from the first tf session.

def predict(X, parameters):

    x = tf.placeholder("float", shape=(None, 64, 64, 3))
    z3 = forward_propagation(x, parameters)
    a3 = tf.nn.softmax(z3, axis=1)
    p = tf.argmax(a3, axis=1)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        prediction = sess.run(p, feed_dict = {x: X})

    return prediction

But the second session requires sess.run(tf.global_variables_initializer()) otherwise tf.contrib.layers.fully_connected() throws an error without it. As a result, returned predictions are randomized and change for every run. I strongly suspect the weights are randomized or not loaded properly.

When I run the command tf.trainable_variables(), I get this output:

    [<tf.Variable 'W1:0' shape=(4, 4, 3, 8) 
    dtype=float32_ref>,
     <tf.Variable 'W2:0' shape=(2, 2, 8, 16) 
    dtype=float32_ref>,
     <tf.Variable 'fully_connected/weights:0' shape=(64, 6) 
    dtype=float32_ref>,
     <tf.Variable 'fully_connected/biases:0' shape=(6,) 
    dtype=float32_ref>]

It appears the weight and bias are present as variables.

So my question is essentially this: How can I get the function tf.contrib.layers.fully_connected(P2, 6, activation_fn=None) to properly load the weights or respond even when performing the global variables initializer? Am I seriously missing a process or steps?

I verified the problem lies in the fully connected function because when I removed this layer and just implemented z_out = tf.matmul(P2, W3) + b3 (where W3 and b3 were properly trained variables in the parameters dict), the behavior was normal and stable with expected predictions.

Upvotes: 0

Views: 1210

Answers (2)

Bill
Bill

Reputation: 663

I figured this out. I need to add tf.train.Saver().save at the end of training session. Then during the prediction session, I need to add tf.train.Saver().restore. I also found that the graph needs to be reset before the training starts and also before the variables are restored to ensure a clean slate.

def model(X_train, Y_train,...):
    ....

    tf.reset_default_graph()

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer()
        ....
        loop thru epochs and mini-batches...
        optimize parameters...
        parameters = sess.run(parameters)
        save_path = tf.train.Saver().save(sess, "/tmp/model.ckpt")
        print (f"Variables saved in path: {save_path}")
    ....

Then during prediction:

def predict(X, parameters):

    tf.reset_default_graph()

    x = tf.placeholder("float", shape=(None, 64, 64, 3))
    z3 = forward_propagation(x, parameters)
    a3 = tf.nn.softmax(z3, axis=1)
    p = tf.argmax(a3, axis=1)

    with tf.Session() as sess:
        tf.train.Saver().restore(sess, "/tmp/model.ckpt")
        prediction = sess.run(p, feed_dict = {x: X})

    return prediction

Upvotes: 0

Robin
Robin

Reputation: 1599

You should keep the trained weights somewhere in memory. Usually, forward_propagation, and predict are some methods of a python class, and model can be wrapped in the __init__() of that class. Moreover, keep the tensorflow variables as class attributes like that:self.z_out = tf.contrib.layers.fully_connected(P2, 6, activation_fn=None)

Then when you call predict, you'll call forward_propagation(x, parameters) which will reuse the already initialized and trained self.z_out layer. So you won't get any error thrown.

Right now, your current code just redefine brand new layers when you call forward_propagation and expects you to initialize these layers. Note that you should keep all layers in your object, not only the last layer (here the convolutional layers).

Upvotes: 1

Related Questions