Reputation: 1227
I'm looking into the code for distributed inception model in TF, in which I have below questions about the use of tf.train.Supervisor.start_queue_runners
in inception_distributed_train.py:
Why do we need to explicitly call sv.start_queue_runners()
in line
264
and line 269
in inception_distributed_train.py? In API
doc. of start_queue_runners, I see there is no need for such
calls due to:
Note that the queue runners collected in the graph key QUEUE_RUNNERS are already started automatically when you create a session with the supervisor, so unless you have non-collected queue runners to start you do not need to call this explicitly.
I noticed the values of queue_runners
in calling
sv.start_queue_runners
are different in line 264
and line
269
in inception_distributed_train.py. But aren't
chief_queue_runners
also in the collection of
tf.GraphKeys.QUEUE_RUNNERS
(all QUEUE_RUNNERS
are obtained in line 263
)? If
so, then there is no need for line 269
since the chief_queue_runners
has already
been started in line 264
.
Besides, could you please explain to me or show me some references about what queues are created in tf.train.Supervisor
?
Thanks for your time!
Upvotes: 0
Views: 313
Reputation: 57883
Not an answer, but some general notes how to find an answer :)
First of all, using github's blame, inception_distributed was checked in on April 13, while that comment in start_queue_runners
was added on Apr 15th, so it's possible that functionality was changed but didn't get updated in all the places that use it.
You could comment-out that line and see if things still work. And if not, you could add import pdb; pdb.set_trace()
in the place where queue runner gets created (ie here) and see who is creating those extra unattended queue runners.
Also, Supervisor development seems to have slowed down and things are getting moved over to FooSession (from comment here). Those provide a more robust training architecture (your workers won't crash because of temporary network error), but there are not many examples on how to use them yet.
Upvotes: 1