Reputation: 10759
Occasionally when I run TensorFlow using a single GPU but in a multiple GPU setup, the code will execute on one GPU, but allocate memory on another. This, for obvious reasons, causes a major slowdown.
As an example, see the below result of nvidia-smi
. Here, a collegue of mine is using gpus 0 and 1 (processes 32918 and 33112), and I start TensorFlow with the following commands (before import tensorflow)
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id)
where gpu_id = 2, 3 and 4 respectively for my three processes. As we can see, the memory is correctly allocated on gpus 2, 3 and 4 but the code is executed somewhere else! In this case, on gpus 0, 1 and 7.
Wed May 17 17:04:01 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:04:00.0 Off | 0 |
| N/A 41C P0 75W / 149W | 278MiB / 11439MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 Off | 0000:05:00.0 Off | 0 |
| N/A 36C P0 89W / 149W | 278MiB / 11439MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 Off | 0000:08:00.0 Off | 0 |
| N/A 61C P0 58W / 149W | 6265MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 Off | 0000:09:00.0 Off | 0 |
| N/A 42C P0 70W / 149W | 8313MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 Off | 0000:84:00.0 Off | 0 |
| N/A 51C P0 55W / 149W | 8311MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 Off | 0000:85:00.0 Off | 0 |
| N/A 29C P0 68W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 Off | 0000:88:00.0 Off | 0 |
| N/A 31C P0 54W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 Off | 0000:89:00.0 Off | 0 |
| N/A 27C P0 68W / 149W | 0MiB / 11439MiB | 33% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 32918 C python 274MiB |
| 1 33112 C python 274MiB |
| 2 34891 C ...sadl/anaconda3/envs/tensorflow/bin/python 6259MiB |
| 3 34989 C ...sadl/anaconda3/envs/tensorflow/bin/python 8309MiB |
| 4 35075 C ...sadl/anaconda3/envs/tensorflow/bin/python 8307MiB |
+-----------------------------------------------------------------------------+
It seems that tensorflow, for some reason, is partially ignoring the "CUDA_VISIBLE_DEVICES" option.
I am not using any device placement commands in the code.
This was experianced with TensorFlow 1.1 running on ubuntu 16.04 and has happend to me across a range of different scenarios.
Is there some known scenario in which this could happen? If so, is there anything I can do about it?
Upvotes: 1
Views: 564
Reputation: 10759
I solved the issue.
It seems the problem had to do with nvidia-smi and not tensorflow, and if you enable persistence mode on the gpus via sudo nvidia-smi -pm 1
, the correct status is shown, e.g. something like:
Fri May 19 15:28:06 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:04:00.0 Off | 0 |
| N/A 60C P0 143W / 149W | 6263MiB / 11439MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 0000:05:00.0 Off | 0 |
| N/A 46C P0 136W / 149W | 8311MiB / 11439MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 0000:08:00.0 Off | 0 |
| N/A 64C P0 110W / 149W | 8311MiB / 11439MiB | 67% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 0000:09:00.0 Off | 0 |
| N/A 48C P0 142W / 149W | 8311MiB / 11439MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 On | 0000:84:00.0 Off | 0 |
| N/A 32C P8 27W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 On | 0000:85:00.0 Off | 0 |
| N/A 26C P8 28W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 On | 0000:88:00.0 Off | 0 |
| N/A 28C P8 26W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 On | 0000:89:00.0 Off | 0 |
| N/A 25C P8 28W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 42840 C ...sadl/anaconda3/envs/tensorflow/bin/python 6259MiB |
| 1 42878 C ...sadl/anaconda3/envs/tensorflow/bin/python 8307MiB |
| 2 43264 C ...sadl/anaconda3/envs/tensorflow/bin/python 8307MiB |
| 3 4721 C python 8307MiB |
+-----------------------------------------------------------------------------+
Thanks for the input in solving this.
Upvotes: 0
Reputation: 1802
One of the possible reasons would be "nvidia-smi".
nvidia-smi order is not as same as GPU Ids.
"It is recommended that users desiring consistency use either UUDI or PCI bus ID, since device enumeration ordering is not guaranteed to be consistent"
"FASTEST_FIRST causes CUDA to guess which device is fastest using a simple heuristic, and make that device 0, leaving the order of the rest of the devices unspecified. PCI_BUS_ID orders devices by PCI bus ID in ascending order."
Have a look here : http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars
Also discussed here : Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName()
Upvotes: 1