Reputation: 1255
I sat up an AWS Deep Learning machine using the AMI. Now I'm trying to run the simple starter example from the TensorFlow
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
But it appears that my machine is not using my GPUs.
MatMul_2: (MatMul): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830238: I tensorflow/core/common_runtime/simple_placer.cc:847] MatMul_2: (MatMul)/job:localhost/replica:0/task:0/cpu:0 MatMul_1: (MatMul): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830259: I tensorflow/core/common_runtime/simple_placer.cc:847] MatMul_1: (MatMul)/job:localhost/replica:0/task:0/cpu:0 MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830271: I tensorflow/core/common_runtime/simple_placer.cc:847] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0 b_2: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830283: I tensorflow/core/common_runtime/simple_placer.cc:847] b_2: (Const)/job:localhost/replica:0/task:0/cpu:0 a_2: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830312: I tensorflow/core/common_runtime/simple_placer.cc:847] a_2: (Const)/job:localhost/replica:0/task:0/cpu:0 b_1: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830324: I tensorflow/core/common_runtime/simple_placer.cc:847] b_1: (Const)/job:localhost/replica:0/task:0/cpu:0 a_1: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830337: I tensorflow/core/common_runtime/simple_placer.cc:847] a_1: (Const)/job:localhost/replica:0/task:0/cpu:0 b: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830348: I tensorflow/core/common_runtime/simple_placer.cc:847] b: (Const)/job:localhost/replica:0/task:0/cpu:0 a: (Const): /job:localhost/replica:0/task:0/cpu:0 2017-07-09 00:51:03.830358: I tensorflow/core/common_runtime/simple_placer.cc:847] a: (Const)/job:localhost/replica:0/task:0/cpu:0
If I try to manually specify GPU with with tf.device('/gpu:0'):
I get the following error:
InvalidArgumentError: Cannot assign a device for operation 'MatMul_3': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device. [[Node: MatMul_3 = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/device:GPU:0"](a_3, b_3)]]
The only change I made to the AMI was that I updated TensorFlow to the latest version
Here's what I see when I run watch nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:00:1E.0 Off | 0 |
| N/A 44C P8 27W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Upvotes: 2
Views: 1520
Reputation: 70
1.check your instance,do you select GPU?
use "watch nvidia-smi" to see GPU info.
2.check your AMI,and tensorflow version, maybe it doesn't support GPU or have some wrong config.
I use this AMI: Deep Learning AMI Amazon Linux (ami-296e7850).
Upvotes: 2