Reputation: 663
I'm relatively new to neural nets and tensorflow in general. In a course I'm taking, we constructed a cnn. In simplistic terms, I think I can convey what is happening with some stripped down statements:
In the first tf.session(), the network is trained and parameters are stored to a python dictionary. I monitor the cost and test/train accuracy during the training in Tensorboard, and I get very reasonable results.
def forward_propagation(X, parameters):
....
model layers...
....
z_out = tf.contrib.layers.fully_connected(P2, 6, activation_fn=None)
return z_out
def model(X_train, Y_train,...):
....
with tf.Session() as sess:
sess.run(tf.global_variables_initializer()
....
loop thru epochs and mini-batches...
optimize parameters...
parameters = sess.run(parameters)
....
return parameters # python dict
Then in the second tf session, predictions are made using the trained parameters from the first tf session.
def predict(X, parameters):
x = tf.placeholder("float", shape=(None, 64, 64, 3))
z3 = forward_propagation(x, parameters)
a3 = tf.nn.softmax(z3, axis=1)
p = tf.argmax(a3, axis=1)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
prediction = sess.run(p, feed_dict = {x: X})
return prediction
But the second session requires sess.run(tf.global_variables_initializer()) otherwise tf.contrib.layers.fully_connected() throws an error without it. As a result, returned predictions are randomized and change for every run. I strongly suspect the weights are randomized or not loaded properly.
When I run the command tf.trainable_variables(), I get this output:
[<tf.Variable 'W1:0' shape=(4, 4, 3, 8)
dtype=float32_ref>,
<tf.Variable 'W2:0' shape=(2, 2, 8, 16)
dtype=float32_ref>,
<tf.Variable 'fully_connected/weights:0' shape=(64, 6)
dtype=float32_ref>,
<tf.Variable 'fully_connected/biases:0' shape=(6,)
dtype=float32_ref>]
It appears the weight and bias are present as variables.
So my question is essentially this: How can I get the function tf.contrib.layers.fully_connected(P2, 6, activation_fn=None) to properly load the weights or respond even when performing the global variables initializer? Am I seriously missing a process or steps?
I verified the problem lies in the fully connected function because when I removed this layer and just implemented z_out = tf.matmul(P2, W3) + b3 (where W3 and b3 were properly trained variables in the parameters dict), the behavior was normal and stable with expected predictions.
Upvotes: 0
Views: 1210
Reputation: 663
I figured this out. I need to add tf.train.Saver().save at the end of training session. Then during the prediction session, I need to add tf.train.Saver().restore. I also found that the graph needs to be reset before the training starts and also before the variables are restored to ensure a clean slate.
def model(X_train, Y_train,...):
....
tf.reset_default_graph()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer()
....
loop thru epochs and mini-batches...
optimize parameters...
parameters = sess.run(parameters)
save_path = tf.train.Saver().save(sess, "/tmp/model.ckpt")
print (f"Variables saved in path: {save_path}")
....
Then during prediction:
def predict(X, parameters):
tf.reset_default_graph()
x = tf.placeholder("float", shape=(None, 64, 64, 3))
z3 = forward_propagation(x, parameters)
a3 = tf.nn.softmax(z3, axis=1)
p = tf.argmax(a3, axis=1)
with tf.Session() as sess:
tf.train.Saver().restore(sess, "/tmp/model.ckpt")
prediction = sess.run(p, feed_dict = {x: X})
return prediction
Upvotes: 0
Reputation: 1599
You should keep the trained weights somewhere in memory. Usually, forward_propagation
, and predict
are some methods of a python class, and model
can be wrapped in the __init__()
of that class. Moreover, keep the tensorflow variables as class attributes like that:self.z_out = tf.contrib.layers.fully_connected(P2, 6, activation_fn=None)
Then when you call predict
, you'll call forward_propagation(x, parameters)
which will reuse the already initialized and trained self.z_out
layer. So you won't get any error thrown.
Right now, your current code just redefine brand new layers when you call forward_propagation
and expects you to initialize these layers. Note that you should keep all layers in your object, not only the last layer (here the convolutional layers).
Upvotes: 1