Reputation: 3011
I am learning to use Tensorflow for object detection. To speed up the training process, I have taken a AWS g3.16xlarge instance which has 4 GPUs. I am using the following code to run training process:
export CUDA_VISIBLE_DEVICES=0,1,2,3
python object_detection/train.py --logtostderr --pipeline_config_path=/home/ubuntu/builder/rcnn.config --train_dir=/home/ubuntu/builder/experiments/training/
Inside the rcnn.config - i have set the batch-size = 1
. During runtime I get the following output:
console output
2018-11-09 07:25:50.104310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-11-09 07:25:50.104385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3
2018-11-09 07:25:50.104395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y N N N
2018-11-09 07:25:50.104402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: N Y N N
2018-11-09 07:25:50.104409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2: N N Y N
2018-11-09 07:25:50.104416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3: N N N Y
2018-11-09 07:25:50.104429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla M60, pci bus id: 0000:00:1b.0, compute capability: 5.2)
2018-11-09 07:25:50.104439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla M60, pci bus id: 0000:00:1c.0, compute capability: 5.2)
2018-11-09 07:25:50.104446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: Tesla M60, pci bus id: 0000:00:1d.0, compute capability: 5.2)
2018-11-09 07:25:50.104455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla M60, pci bus id: 0000:00:1e.0, compute capability: 5.2)
When I run nvidia-smi
, I get the following output:
nvidia-smi output
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 0000:00:1B.0 Off | 0 |
| N/A 52C P0 129W / 150W | 7382MiB / 7612MiB | 92% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 0000:00:1C.0 Off | 0 |
| N/A 33C P0 38W / 150W | 7237MiB / 7612MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 Off | 0000:00:1D.0 Off | 0 |
| N/A 40C P0 38W / 150W | 7237MiB / 7612MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 Off | 0000:00:1E.0 Off | 0 |
| N/A 34C P0 39W / 150W | 7237MiB / 7612MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 97860 C python 7378MiB |
| 1 97860 C python 7233MiB |
| 2 97860 C python 7233MiB |
| 3 97860 C python 7233MiB |
+-----------------------------------------------------------------------------+
and **nvidia-smi dmon**
provides the following output:
# gpu pwr temp sm mem enc dec mclk pclk
# Idx W C % % % % MHz MHz
0 158 69 90 69 0 0 2505 1177
1 38 36 0 0 0 0 2505 556
2 38 45 0 0 0 0 2505 556
3 39 37 0 0 0 0 2505 556
I am confused with each of the output. While I read the console output as the program is recognizing the availability of 4 different gpus, in the nvidia-smi output the volatile GPU-Util percentage is shown only for the first GPU and for the rest it is zero. However the same table prints memory usage for all the 4 gpu's at the bottom. And the nvidia-smi dmon prints the sm values only for first gpu and for the others it is zero. From this blog I understand the zero in dmon
indicates that GPU is free.
What I want to understand is, does the train.py utilizes all the 4 GPU's that I have in my instance. If it is not utilizing all the GPU's how do I ensure the object_detection/train.py
of tensorflow is optimized for all the GPU's.
Upvotes: 6
Views: 13370
Reputation: 21765
Python code to check if GPU is found and available for using with tensorflow
:
## Libraries import
import tensorflow as tf
## Test GPU
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
print('')
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
Upvotes: 1
Reputation: 2129
Check if it's returning list of all GPUs.
tf.test.gpu_device_name()
Returns the name of a GPU device if available or the empty string.
then you can do something like this to use all the available GPUs.
# Creates a graph.
c = []
for d in ['/device:GPU:2', '/device:GPU:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(sum))
You see below output:
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K20m, pci bus
id: 0000:02:00.0
/job:localhost/replica:0/task:0/device:GPU:1 -> device: 1, name: Tesla K20m, pci bus
id: 0000:03:00.0
/job:localhost/replica:0/task:0/device:GPU:2 -> device: 2, name: Tesla K20m, pci bus
id: 0000:83:00.0
/job:localhost/replica:0/task:0/device:GPU:3 -> device: 3, name: Tesla K20m, pci bus
id: 0000:84:00.0
Const_3: /job:localhost/replica:0/task:0/device:GPU:3
Const_2: /job:localhost/replica:0/task:0/device:GPU:3
MatMul_1: /job:localhost/replica:0/task:0/device:GPU:3
Const_1: /job:localhost/replica:0/task:0/device:GPU:2
Const: /job:localhost/replica:0/task:0/device:GPU:2
MatMul: /job:localhost/replica:0/task:0/device:GPU:2
AddN: /job:localhost/replica:0/task:0/cpu:0
[[ 44. 56.]
[ 98. 128.]]
Upvotes: 4