with tf.device() does not work in TensorFlow

Question

I have two GPUs and want to try some distributed training(model-parallelism) in TensorFlow.

The two GPUs are:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: TITAN Xp COLLECTORS EDITION, pci bus id: 0000:04:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: TITAN X (Pascal), pci bus id: 0000:82:00.0, compute capability: 6.1

My plan is to divide LeNet into two parts, assign each part to one GPU.

LeNet has 5 layers, I use with tf.device('/gpu:0'): to assign layer 1 to GPU 0, with tf.device('/gpu:1'):to assign layer2-layer5 to GPU 1.

I know there is no need to do model-parallelism in this model, but I just want to try model-parallelism in small models.

The log for device mapping shows that, all the ops have been assigned to the device as I wish:

layer5/fc3_b: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer5/fc3_b/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer5/fc3_b/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
layer5/fc3_w: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer5/fc3_w/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer5/truncated_normal/TruncatedNormal: (TruncatedNormal): /job:localhost/replica:0/task:0/device:GPU:1
layer5/truncated_normal/mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:1
layer5/truncated_normal: (Add): /job:localhost/replica:0/task:0/device:GPU:1
layer5/fc3_w/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
layer4/fc2_b: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer4/fc2_b/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer4/fc2_b/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
layer4/fc2_w: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer4/fc2_w/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer4/truncated_normal/TruncatedNormal: (TruncatedNormal): /job:localhost/replica:0/task:0/device:GPU:1
layer4/truncated_normal/mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:1
layer4/truncated_normal: (Add): /job:localhost/replica:0/task:0/device:GPU:1
layer4/fc2_w/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
layer3/fc1_b: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer3/fc1_b/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer3/fc1_b/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
layer3/fc1_w: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer3/fc1_w/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer3/truncated_normal/TruncatedNormal: (TruncatedNormal): /job:localhost/replica:0/task:0/device:GPU:1
layer3/truncated_normal/mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:1
layer3/truncated_normal: (Add): /job:localhost/replica:0/task:0/device:GPU:1
layer3/fc1_w/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
layer2/conv2_b: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer2/conv2_b/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer2/conv2_b/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
layer2/conv2_w: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:1
layer2/conv2_w/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:1
layer2/truncated_normal/TruncatedNormal: (TruncatedNormal): /job:localhost/replica:0/task:0/device:GPU:1
layer2/truncated_normal/mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:1
layer2/truncated_normal: (Add): /job:localhost/replica:0/task:0/device:GPU:1
layer2/conv2_w/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:1
init/NoOp_1: (NoOp): /job:localhost/replica:0/task:0/device:GPU:1
layer1/conv1_b: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:0
layer1/conv1_b/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:0
layer1/conv1_b/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:0
layer1/conv1_w: (VariableV2): /job:localhost/replica:0/task:0/device:GPU:0
layer1/conv1_w/read: (Identity): /job:localhost/replica:0/task:0/device:GPU:0
layer1/truncated_normal/TruncatedNormal: (TruncatedNormal): /job:localhost/replica:0/task:0/device:GPU:0
layer1/truncated_normal/mul: (Mul): /job:localhost/replica:0/task:0/device:GPU:0
layer1/truncated_normal: (Add): /job:localhost/replica:0/task:0/device:GPU:0
layer1/conv1_w/Assign: (Assign): /job:localhost/replica:0/task:0/device:GPU:0

But I got a different result in timeline.json as shown in the figure below.

The timeline shows that, it seems ops of layer2-layer5 are launched in GPU1 but run on GPU0. I don't think this is what I want by using with tf.device('/gpu:1'):.

Is this expected in TensorFlow?

This is the first time I asked question on stack overflow, if any other information is needed, please inform me, thanks.

Ubaid Ullah Hafeez · Accepted Answer

This is just an artifact of Chrome Trace Event Format.

Stream, "/job:localhost/replica:0/task:0/devuce:GPU:0 Compute" shows the time to launch/queue CUDA kernels for ops being executed on GPU:0.

Stream, "/job:localhost/replica:0/task:0/devuce:GPU:1 Compute" shows the time to launch/queue CUDA kernels for ops being executed on GPU:1.

All streams matching, "/device:GPU:0/stream.* Compute" shows the time to actually execute Operations on all the GPUs. To find out, on which GPU the operation is actually executed, you need to look in the streams "/job:localhost/replica:0/task:0/devuce:GPU:.* Compute"

Hope this answers your question

with tf.device() does not work in TensorFlow

Answers (1)

Related Questions