Reputation: 1285
I've seen this issue brought up a few times on StackOverflow, but none of the solutions have helped me.
I've trained an actor-critic reinforcement learning network in tensorflow.compat.v1
, and am using the saver.save()
function throughout training to save the model files as it goes, so I end up with the .index
, .meta
and .data
files. Using Python 3.6 in Windows.
Now in a second script, I want to reload this model that uses the exact same architecture and dataset, but when I run it, I get completely different results, which I think tells me I'm not loading the model properly. Note, I'm using self.sess = tf.InteractiveSession()
so I'm not running it within a with sess
loop.
So in the training script, I reference my Actor and Critic networks and begin the session:
tf.reset_default_graph()
self.actor = Actor("actor-original", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE)
self.actor_target = Actor("actor-target", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE)
self.critic = Critic("critic-original", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE, self.LEARNING_RATE)
self.critic_target = Critic("critic-target", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE, self.LEARNING_RATE)
self.grad_critic = tf.gradients(self.critic.logits, self.critic.Y)
self.actor_critic_grad = tf.placeholder(tf.float32, [None, self.OUTPUT_SIZE])
weights_actor = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="actor")
self.grad_actor = tf.gradients(self.actor.logits, weights_actor, -self.actor_critic_grad)
grads = zip(self.grad_actor, weights_actor)
self.optimizer = tf.train.AdamOptimizer(self.LEARNING_RATE).apply_gradients(grads) # Adam optimizer
self.sess = tf.InteractiveSession() # Start the session
self.sess.run(tf.global_variables_initializer())
Then I use the saver.save()
function below during training when certain metrics are met:
saver = tf.train.Saver(max_to_keep=1)
save_path = saver.save(self.sess, "./model_checkpoint_files"))
Now, in my secondary script, I want to reload this model. So far, what I have is this:
tf.reset_default_graph()
self.actor = Actor("actor-original", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE)
self.actor_target = Actor("actor-target", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE)
self.critic = Critic("critic-original", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE, self.LEARNING_RATE)
self.critic_target = Critic("critic-target", self.state_size, self.OUTPUT_SIZE, self.LAYER_SIZE, self.LEARNING_RATE)
self.grad_critic = tf.gradients(self.critic.logits, self.critic.Y)
self.actor_critic_grad = tf.placeholder(tf.float32, [None, self.OUTPUT_SIZE])
weights_actor = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope="actor")
self.grad_actor = tf.gradients(self.actor.logits, weights_actor, -self.actor_critic_grad)
grads = zip(self.grad_actor, weights_actor)
self.optimizer = tf.train.AdamOptimizer(self.LEARNING_RATE).apply_gradients(grads) # Adam optimizer
self.sess = tf.InteractiveSession() # Start the session
#self.sess.run(tf.global_variables_initializer())
saver = tf.compat.v1.train.Saver()
#saver = tf.compat.v1.train.import_meta_graph("./model_checkpoint_files.meta")
saver.restore(self.sess, "./model_checkpoint_files")
#self.sess.run(tf.local_variables_initializer()) # tf.initialize_all_variables() # tf.local_variables_initializer() # tf.global_variables_initializer()
As you can see, I've tried some different combinations of model loading to try and get this working. Further on in my second script, I just call the self.sess.run()
to get the best action.
Does anyone see anything that I'm missing? Just looking to load the model and use it on the same dataset to get repeatable results. Thanks!
UPDATE
The more I'm reading about this, I'm thinking the reason is because some values aren't being saved using the tf.saver()
function, and what I should be doing is using the saved_model()
function as it also saves the values of the variables used in the training? Using my sample code above, how might I go about implementing that? Thanks!
Upvotes: 2
Views: 1304
Reputation: 7676
I think the problem is you create a new session
when you restore the model with self.sess = tf.InteractiveSession() # Start the session
. The session
used by a Saver
to save a model should be the same as the one to restore the model, since a tensorflow
graph lives inside a session
.
Here is a working example that you can modify
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
class Model():
def __init__(self):
self.x = tf.placeholder('float32',shape=[None,1])
self.y = tf.placeholder('float32',shape=[None,1])
self.w1 = tf.get_variable('w1',initializer=tf.zeros([1, 20]))
self.b1 = tf.get_variable('b1',initializer=tf.zeros([20]))
self.w2 = tf.get_variable('w2',initializer=tf.zeros([20, 1]))
self.b2 = tf.get_variable('b2',initializer=tf.zeros([1]))
self.y1 = tf.matmul(self.x, self.w1) + self.b1
self.y2 = tf.matmul(self.y1, self.w2) + self.b2
self.loss = tf.reduce_mean(tf.square(self.y - self.y2))
self.train_step = tf.train.AdamOptimizer(0.01).minimize(self.loss)
self.sess = tf.InteractiveSession()
self.saver = tf.train.Saver()
def train(self,x_train,y_train):
self.sess.run(tf.global_variables_initializer())
for iter in range(100):
for _ in range(10):
batch_xs, batch_ys = x_train[_*10:(_+1)*10][:,None],y_train[_*10:(_+1)*10][:,None]
self.sess.run(self.train_step, feed_dict={self.x: batch_xs, self.y: batch_ys})
if iter == 50:
self.saver.save(self.sess, 'tmp/model.ckpt')
self.inside_val = self.sess.run(self.loss, feed_dict={self.x:x_train[:,None],self.y:y_train[:,None]})
print(self.sess.run(self.loss, feed_dict={self.x:x_train[:,None],self.y:y_train[:,None]}))
def test(self,x_test,y_test):
print(self.sess.run(self.loss, feed_dict={self.x: x_test[:,None], self.y: y_test[:,None]}))
def reload(self,model_path,x_test,y_test):
self.saver.restore(self.sess, model_path)
return self.sess.run(self.loss, feed_dict={self.x: x_test[:,None], self.y: y_test[:,None]})
x_train = np.linspace(1, 6, 101)
y_train = 2 * x_train + 3 + 0.1 * np.random.random(101)
m = Model()
m.train(x_train,y_train)
# test
m.test(x_train,y_train)
try:
assert m.inside_val == m.reload('tmp/model.ckpt',x_train,y_train)
print('checked')
except AssertionError:
print('The values are different.')
Running this script, you will find the trained model is successfully restored.
Upvotes: 2