Tensorflow multiple Run and Node Placement

Question

In this Tensorflow Distributed Training code example, sess.run([train_op, global_step]) will be invoked multiple times (In a while loop). Since before the execution of a DAG of operations, Tensorflow needs first to place the graph nodes to some devices(node placement process).

In this scenario, I was wondering how many node placement process need to be done. Say the loop count is N, does the Tensorflow sys performs the node placement only once? OR performs node placement N times?

yuefengz · Accepted Answer

The device placement for nodes happens only once. You can control the device placement with directive such as tf.device or tf.train.replica_device_setter.

Since tensorflow would partition a graph by devices, add recv and send nodes to each of these subgraphs and perform additional setups, it is expensive to replace these nodes to different devices. But you can still change the graph between calls to session.run.

EDIT: The device is an attribute of a node which is set by this function and is set when the graph is constructed. When you use tf.device, a device function would be pushed to a stack and the following nodes would call the device function in the stack to get a device assignment. Its implementation can be found here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/ops.py#L2880

(TensorFlow uses deferred execution.) When the graph is being evaluated, it would be partitioned according to device assignment and subgraphs would be sent to different devices to be executed.

Tensorflow multiple Run and Node Placement

Answers (1)

Related Questions