Reputation: 1962
No GPU, no queues, Tensorflow 1.1.0
There's this sample LSTM code:
https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py
This code works. It prints training process info, it's cool. Now, I tried to write a trained model graph to disk using freeze_graph()
, and eventually I found out that this LSTM tutorial uses a Supervisor
to train the model, and Supervisor
freezes the graph, and a frozen graph can not be used in the freeze_graph()
procedure.
I tried to switch from a Supervisor
to using an ordinary session. The only changes made were in the main()
procedure (apart from importing some stuff). It now looks like this (changed parts are highlighted, and I removed all graph-saving related stuff, it's not the matter here):
with tf.Graph().as_default():
initializer = tf.random_uniform_initializer(
-config.init_scale, config.init_scale)
with tf.name_scope("Train"):
train_input = PTBInput(
config=config, data=train_data, name="TrainInput")
with tf.variable_scope("Model", reuse=None, initializer=initializer):
m = PTBModel(
is_training=True, config=config, input_=train_input)
tf.summary.scalar("Training Loss", m.cost)
tf.summary.scalar("Learning Rate", m.lr)
with session.Session() as sess: # CHANGED
sess.run(variables.global_variables_initializer()) # CHANGED
for i in range(config.max_max_epoch):
lr_decay = config.lr_decay ** max(i +
1 - config.max_epoch, 0.0)
m.assign_lr(sess, config.learning_rate * lr_decay)
print("Epoch: %d Learning rate: %.3f" %
(i + 1, sess.run(m.lr)))
train_perplexity = run_epoch(sess, m, eval_op=m.train_op,
verbose=True)
print("Epoch: %d Train Perplexity: %.3f" %
(i + 1, train_perplexity))
After these changes the whole thing started to freeze at this very line:
https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py#L300
It is a session.run()
call in model internals (doesn't react to Ctrl+C, killable with kill -9
):
vals = session.run(fetches, feed_dict)
Previous session.run()
calls (there's some) worked just fine.
What did I do wrong? It seems like all variables are initialized just fine (which was done by Supervisor
in the original code). Any ideas?
Upvotes: 1
Views: 277
Reputation: 126184
When you use tf.train.Supervisor
, the framework code automatically calls tf.train.start_queue_runners(sess)
(along with initializing variables) at the beginning of the session. If you switch back to using a raw tf.Session
, you must call this manually to start the input pipeline. A change like the following should work:
# ...
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess)
# ...
Upvotes: 2