Reputation: 47
[Python 3.7, Tensorflow] I have trained a neural network. Everything works fine, it learns, but once its done learning it just shuts down and the progress is lost. Now what I want to do is to input new data and see by hand, how good the network does.
I already fiddled around with
saver = tf.train.Saver()
saver.save(sess, 'model/model.ckpt')
but that always results in a mile-long error report ending with "Unknown Error: Failed to rename 'model/model.ckpt'" etc.
The code in context looks like this:
def train_neural_network(x):
training_data = generate_training_data() # i cut getting training data since its a bit out of context here, but its basically like mnist data
prediction = neural_network_model(x) # normal, 3-layer feed forward NN
cost = tf.reduce_mean( tf.square(prediction - y) )
optimizer = tf.train.AdamOptimizer(0.01).minimize(cost)
hm_epochs = 10
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs+1):
epoch_loss = 0
for i in range(10):
epoch_x, epoch_y = training_data
_, c = sess.run([optimizer, cost], feed_dict = {x: epoch_x, y: epoch_y})
saver.save(sess, 'model/model.ckpt')
I try to call this trained neural network in main:
train_neural_network(x)
X, Y = generate_training_data()
prediction = neural_network_model(x)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, 'model/model.ckpt')
result = sess.run(prediction, feed_dict={x: X})
print(Y, result)
Its all in one file so far, but I can also do with two seperate files.
This results in an error, which says the usual python error consisting of its path and ends with
"...in _do_call
raise type(e)(node_def, op, message)"
before a, what I think, Tensorflow-specific error comes up:
"Unknown Error: Failed to rename 'model/model.ckpt'"
and
"Caused by op 'save_13/SaveV2', defined at:",
then theres a long, long path, about 87 lines long,
"Unknown Error" is repeated again.
What I want to have is the printed out label with the from the neural network predicted output. (the print line in the code.)
Unfortunately I have not found anything that works on various internet searches so far, but I feel like it should not be too hard to get this to work. Thank you in advance.
Upvotes: 1
Views: 160
Reputation: 362
If you take a look in the folder where your model outputs the checkpoints (/model), you should see 3 separate files per each save: model.ckpt-xxx.data, model.ckpt-xxx.index and model.ckpt-xxx.meta, where xxx is the ID of the checkpoint appended by Tensorflow.
When you want to restore a certain checkpoint, you have to add the ID as well, because typically several checkpoints of the same network are being created during training so that we can re-train the network later if necessary.
So I would take a look in the model folder and double check the file name, my guess is
saver.restore(sess, 'model/model.ckpt-0')
would do the trick if you only created one checkpoint.
Upvotes: 1