TF LSTM: Save State from training session for prediction session later

Question

I am trying to save the latest LSTM State from training to be reused during the prediction stage later. The problem I am encountering is that in the TF LSTM model the State is passed around from one training iteration to next via a combination of a placeholder and a numpy array -- neither of which seems to be included in the Graph by default when the session is saved.

To work around this, I am creating a dedicated TF variable to hold the latest version of the state so as to add it to the Session graph, like so:

# latest State from last training iteration:
_, y, ostate, smm = sess.run([train_step, Y, H, summaries], feed_dict=feed_dict)
# now add to TF variable:
savedState = tf.Variable(ostate, dtype=tf.float32, name='savedState')
tf.variables_initializer([savedState]).run()
save_path = saver.save(sess, pathModel + '/my_model.ckpt')

This seems to add the savedState variable to the saved session graph well, and is easily recoverable later with the rest of the Session.

The problem though, is that the only way I have managed to actually use that variable later in the restored Session, is that if I initialize all variables in the session AFTER I recover it (which seems to reset all trained variables, including the weights/biases/etc.!). If I initialize variables first and THEN recover the session (which works fine in terms of preserving the trained varialbes), then I am getting an error that I'm trying to access an uninitialized variable.

I know there is a way to initialize a specific individual varialbe (which i am using while saving it originally) but the problem is that when we recover them, we refer to them by name as strings, we don't just pass the variable itself?!

# This produces an error 'trying to use an uninitialized varialbe
gInit = tf.global_variables_initializer().run()
new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
new_saver.restore(sess, pathModel + 'my_model.ckpt')
fullState = sess.run('savedState:0')

What is the right way to get this done? As a workaround, I am currently saving the State to CSV just as a numpy array and then recover it the same way. It works OK, but clearly not the cleanest solution given that every other aspect of saving/restoring the TF session works perfectly.

Any suggestions appreciated!

**EDIT: Here's the code that works well, as described in the accepted answer below:

# make sure to define the State variable before the Saver variable:
savedState = tf.get_variable('savedState', shape=[BATCHSIZE, CELL_SIZE * LAYERS])
saver = tf.train.Saver(max_to_keep=1)
# last training iteration:
_, y, ostate, smm = sess.run([train_step, Y, H, summaries], feed_dict=feed_dict)
# now save the State and the whole model:
assignOp = tf.assign(savedState, ostate)
sess.run(assignOp)
save_path = saver.save(sess, pathModel + '/my_model.ckpt')


# later on, in some other program, recover the model and the State:
# make sure to initialize all variables BEFORE recovering the model!
gInit = tf.global_variables_initializer().run()
local_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
local_saver.restore(sess, pathModel + 'my_model.ckpt')
# recover the state from training and get its last dimension
fullState = sess.run('savedState:0')
h = fullState[-1]
h = np.reshape(h, [1, -1])

I haven't tested yet whether this approach unintentionally initializes any other variables in the saved Session, but don't see why it should, since we only run the specific one.

Allen Lavoie · Accepted Answer

The issue is that creating a new tf.Variable after the Saver was constructed means that the Saver has no knowledge of the new variable. It still gets saved in the metagraph, but not saved in the checkpoint:

import tensorflow as tf
with tf.Graph().as_default():
  var_a = tf.get_variable("a", shape=[])
  saver = tf.train.Saver()
  var_b = tf.get_variable("b", shape=[])
  print(saver._var_list) # []
  initializer = tf.global_variables_initializer()
  with tf.Session() as session:
    session.run([initializer])
    saver.save(session, "/tmp/model", global_step=0)
with tf.Graph().as_default():
  new_saver = tf.train.import_meta_graph("/tmp/model-0.meta")
  print(saver._var_list) # []
  with tf.Session() as session:
    new_saver.restore(session, "/tmp/model-0") # Only var_a gets restored!

I've annotated the quick reproduction of your issue above with the variables that the Saver knows about.

Now, the solution is relatively easy. I would suggest creating the Variable before the Saver, then using tf.assign to update its value (make sure you run the op returned by tf.assign). The assigned value will be saved in checkpoints and restored just like other variables.

This could be handled better by the Saver as a special case when None is passed to its var_list constructor argument (i.e. it could pick up new variables automatically). Feel free to open a feature request on Github for this.

TF LSTM: Save State from training session for prediction session later

Answers (1)

Related Questions