Reputation: 830
If I run a tensorflow model (e.g. cifar10) with one GPU on a multi-gpu platform, tensorflow creates and broadcasts (training/inference) data across all the GPUs available. Since I set num_gpus to 1, it's running on only one GPU. However, I can see the same processes on other gpus as well. Is it intended? Is there any rationale for this? I quickly checked with other DL frameworks like Caffe, but the design/operation is different. Of course, I can specify device
in the code level, but I'm curious. Also, this default design might be annoying for other users if the machine is shared.
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name:
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name:
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:2) -> (device: 2, name:
tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:3) -> (device: 3, name: ...
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 67056 C python 15623MiB |
| 1 67056 C python 15499MiB |
| 2 67056 C python 15499MiB |
| 3 67056 C python 15499MiB |
| 4 67056 C python 15499MiB |
Upvotes: 0
Views: 1088
Reputation: 126174
By default, at startup TensorFlow allocates almost all of the GPU memory on all devices that are visible to it. However, unless you specify otherwise (in a with tf.device():
block, it will only place operations on the device known (to TensorFlow) as "/gpu:0"
, and the other GPUs will be idle.
There are a couple of workarounds:
Set the environment variable CUDA_VISIBLE_DEVICES=0
(or 1
, 2
, etc. as appropriate) before launching python
to control which devices are visible to TensorFlow. This can also be configured using the tf.ConfigProto
options visible_device_list
when creating your first tf.Session
.
Set the tf.ConfigProto
option allow_growth=True
when creating your first tf.Session
. This will prevent TensorFlow from pre-allocating all of the GPU memory.
Upvotes: 2