Al Wld
Al Wld

Reputation: 939

Tensorflow hook after_run not called

So I am looking at this example from Google and they make use of a MonitoredSession, which seems like a really convenient class to save summaries every n steps. According to the doc, the following snippet:

    with tf.train.MonitoredTrainingSession(master=target,
                                           is_chief=is_chief,
                                           checkpoint_dir=job_dir,
                                           save_checkpoint_secs=None,
                                           save_summaries_steps=20) as session:
        while True:
        // do training

should save my summaries every 20 steps. And it almost does, however sometimes, my summaries not being saved and this is really a problem.

Inside, the MonitoredSession creates a SummarySaverHook class, and we would expect its before_run / after_run callbacks to be called once every n global_step. It seems to be the case.

What I have noticed is that the callbacks are not being called by the same threads, so I assume that this could be a source of issue, but really I have no idea what is going on, it is very difficult to debug.

I am sorry for the lack of clarity in my question, but I really have troubles understanding what is going on. Has anyone ever been in a similar situation or knows where this is coming from?

Thank you

Upvotes: 0

Views: 279

Answers (1)

Praveen Kulkarni
Praveen Kulkarni

Reputation: 3251

Did you try to use the hooks argument while using MonitoredTrainingSession?

with tf.train.MonitoredTrainingSession(master=target, hooks=[<your hooks>],
                                           is_chief=is_chief,
                                           checkpoint_dir=job_dir,
                                           save_checkpoint_secs=None,
                                           save_summaries_steps=20) as session:
        while True:
        // do training

Upvotes: 1

Related Questions