Reputation: 518
I am struggling to restore a model for one day without any success. My code consists of a class TF_MLPRegressor()
, where I define the network architecture within the constructor. Then I invoke the fit()
function to do the training. So this is how I save a simple Perceptron model with 1 hidden layer within the fit()
function:
starting_epoch = 0
# Launch the graph
tf.set_random_seed(self.random_state) # fix the random seed before creating the Session in order to take effect!
if hasattr(self, 'sess'):
self.sess.close()
del self.sess # delete Session to release memory
gc.collect()
self.sess = tf.Session(config=self.config) # save the session to predict from new data
# Create a saver object which will save all the variables
saver = tf.train.Saver(max_to_keep=2) # max_to_keep=2 means to not keep more than 2 checkpoint files
self.sess.run(tf.global_variables_initializer())
# ... (each 100 epochs)
saver.save(self.sess, self.checkpoint_dir+"/resume", global_step=epoch)
Then I create a new TF_MLPRegressor()
instance with exactly the same input parameter values and invoke the fit()
function to restore the model like this:
self.sess = tf.Session(config=self.config) # create a new session to load saved variables
ckpt = tf.train.latest_checkpoint(self.checkpoint_dir)
starting_epoch = int(ckpt.split('-')[-1])
metagraph = ".".join([ckpt, 'meta'])
saver = tf.train.import_meta_graph(metagraph)
self.sess.run(tf.global_variables_initializer()) # Initialize variables
lhl = tf.trainable_variables()[2]
lhlA = lhl.eval(session=self.sess)
saver.restore(sess=self.sess, save_path=ckpt) # Restore model weights from previously saved model
lhlB = lhl.eval(session=self.sess)
print lhlA == lhlB
lhlA
and lhlB
are the last hidden layer weights before and after restoring and according to my code they match completely, namely the saved model is not loaded to the session. What am I doing wrong?
Upvotes: 2
Views: 931
Reputation: 518
I found a workaround! Strangely the metagraph does not contain all the variables that I defined or assigns to them new names. For examples in the constructor I define the tensors that will carry the input feature vectors and the experimental values:
self.x = tf.placeholder("float", [None, feat_num], name='x')
self.y = tf.placeholder("float", [None], name='y')
However, when I do tf.reset_default_graph()
and load the metagraph, I get the following list of variables:
[
<tf.Variable 'Variable:0' shape=(300, 300) dtype=float32_ref>,
<tf.Variable 'Variable_1:0' shape=(300,) dtype=float32_ref>,
<tf.Variable 'Variable_2:0' shape=(300, 1) dtype=float32_ref>,
<tf.Variable 'Variable_3:0' shape=(1,) dtype=float32_ref>
]
For the record, each input feature vector has 300 features. Anyway, when I later try to initiate training using:
_, c, p = self.sess.run([self.optimizer, self.cost, self.pred],
feed_dict={self.x: batch_x, self.y: batch_y, self.isTrain: True})
I get an error like:
"TypeError: Cannot interpret feed_dict key as Tensor: Tensor 'x' is not an element of this graph."
So, since every time I create an instance of class TF_MLPRegressor()
, I define the network architecture within the constructor, I decided not to load the metagraph and it worked! I don't know why TF doesn't save all variables into the metagraph, maybe because I define explicitly the network architecture (I don't use wrappers or default layers) like in the example below:
To sum up, I save my models as described in my 1st message but to restore them I use this:
saver = tf.train.Saver(max_to_keep=2)
self.sess = tf.Session(config=self.config) # create a new session to load saved variables
self.sess.run(tf.global_variables_initializer())
ckpt = tf.train.latest_checkpoint(self.checkpoint_dir)
saver.restore(sess=self.sess, save_path=ckpt) # Restore model weights from previously saved model
Upvotes: 1