tobe
tobe

Reputation: 1741

Can TensorFlow schedule operations to all available GPUs automatically?

We have read the paper of TensorFlow about scheduling. It may pre-execute the Graph and find the "right" device to place the operations.

But we have test to use tf.Session(config=tf.ConfigProto(log_device_placement=True)) and not specified any device to run. We found that all the operations are placed in the first GPU.

The log looks like this.

Adam/epsilon: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/epsilon: /job:localhost/replica:0/task:0/gpu:0
Adam/beta2: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/beta2: /job:localhost/replica:0/task:0/gpu:0
Adam/beta1: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/beta1: /job:localhost/replica:0/task:0/gpu:0
Adam/learning_rate: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Adam/learning_rate: /job:localhost/replica:0/task:0/gpu:0
Variable_3/Adam_1: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1: /job:localhost/replica:0/task:0/gpu:0
Variable_3/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0
Variable_3/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0
Variable_3/Adam: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam: /job:localhost/replica:0/task:0/gpu:0
Variable_3/Adam/read: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam/read: /job:localhost/replica:0/task:0/gpu:0
Variable_3/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_3/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0
Variable_2/Adam_1: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1: /job:localhost/replica:0/task:0/gpu:0
Variable_2/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1/read: /job:localhost/replica:0/task:0/gpu:0
Variable_2/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam_1/Assign: /job:localhost/replica:0/task:0/gpu:0
Variable_2/Adam: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam: /job:localhost/replica:0/task:0/gpu:0
Variable_2/Adam/read: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam/read: /job:localhost/replica:0/task:0/gpu:0
Variable_2/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_2/Adam/Assign: /job:localhost/replica:0/task:0/gpu:0
Variable_1/Adam_1: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:818] Variable_1/Adam_1: /job:localhost/replica:0/task:0/gpu:0

The Variables are also placed in GPU. I assure that the scheduler is not good enough right now and the best practice for users is that we should specify the operations to use CPU or GPU, especially when we have multiple GPUs. Is that right?

Upvotes: 2

Views: 881

Answers (1)

user6621932
user6621932

Reputation:

As of v0.9, TensorFlow place all ops on the first GPU that you have. So what you are observing is 100% expected. Now if your question is "Could TensorFlow automatically distribute my graph on my 4 GPUs without my intervention?", the answer as of August 2016 is no.

If you're trying to harness the power of all GPUs available to your local machine, check out this variation of the cifar10 tutorial. The next level would be replicated training with distributed tensorflow, but that might be overkill for what you are trying to do.

And with all the virtualization going on these days, the question of what device a certain operation is assigned to might be irrelevant very soon.

Upvotes: 4

Related Questions