Tensorflow 1.8 GPU version doesn't seem to use GPU on windows

Question

I am trying to use Alexnet CNN from tensorflow, I can train model without any error messages and have access to the tensorboard, and I can test the trained model with no error.

Only problem is, while training, GPU usage stays mostly 0% sometimes fluctuates to 25% but rarely. However all my CPUS are working crazy like over 90%. So I am assuming it is using CPU instead of GPU.

Here is my setup

Windows 8.1 x64
GPU 1070 driver version 3.88
tensorflow-gpu 1.8.0
CUDA toolkit v9.0
cuDNN version 7

I can import tensorflow in python with no error, I ran some tests to see if it is installed correctly,

Test 1

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())  

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 625346735515728619
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6911164212
locality {
  bus_id: 1
  links {
  }
}
incarnation: 15764160474642097170
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1"
]

Test 2

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

b'Hello, TensorFlow!'

Test3

import ctypes
import imp
import sys

def main():
  try:
    import tensorflow as tf
    print("TensorFlow successfully installed.")
    if tf.test.is_built_with_cuda():
      print("The installed version of TensorFlow includes GPU support.")
    else:
      print("The installed version of TensorFlow does not include GPU support.")
    sys.exit(0)
  except ImportError:
    print("ERROR: Failed to import the TensorFlow module.")

  candidate_explanation = False

  python_version = sys.version_info.major, sys.version_info.minor
  print("
- Python version is %d.%d." % python_version)
  if not (python_version == (3, 5) or python_version == (3, 6)):
    candidate_explanation = True
    print("- The official distribution of TensorFlow for Windows requires "
          "Python version 3.5 or 3.6.")

  try:
    _, pathname, _ = imp.find_module("tensorflow")
    print("
- TensorFlow is installed at: %s" % pathname)
  except ImportError:
    candidate_explanation = False
    print("""
- No module named TensorFlow is installed in this Python environment. You may
  install it using the command `pip install tensorflow`.""")

  try:
    msvcp140 = ctypes.WinDLL("msvcp140.dll")
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'msvcp140.dll'. TensorFlow requires that this DLL be
  installed in a directory that is named in your %PATH% environment
  variable. You may install this DLL by downloading Microsoft Visual
  C++ 2015 Redistributable Update 3 from this URL:
  https://www.microsoft.com/en-us/download/details.aspx?id=53587""")

  try:
    cudart64_80 = ctypes.WinDLL("cudart64_80.dll")
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'cudart64_80.dll'. The GPU version of TensorFlow
  requires that this DLL be installed in a directory that is named in
  your %PATH% environment variable. Download and install CUDA 8.0 from
  this URL: https://developer.nvidia.com/cuda-toolkit""")

  try:
    nvcuda = ctypes.WinDLL("nvcuda.dll")
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'nvcuda.dll'. The GPU version of TensorFlow requires that
  this DLL be installed in a directory that is named in your %PATH%
  environment variable. Typically it is installed in 'C:\Windows\System32'.
  If it is not present, ensure that you have a CUDA-capable GPU with the
  correct driver installed.""")

  cudnn5_found = False
  try:
    cudnn5 = ctypes.WinDLL("cudnn64_5.dll")
    cudnn5_found = True
  except OSError:
    candidate_explanation = True
    print("""
- Could not load 'cudnn64_5.dll'. The GPU version of TensorFlow
  requires that this DLL be installed in a directory that is named in
  your %PATH% environment variable. Note that installing cuDNN is a
  separate step from installing CUDA, and it is often found in a
  different directory from the CUDA DLLs. You may install the
  necessary DLL by downloading cuDNN 5.1 from this URL:
  https://developer.nvidia.com/cudnn""")

  cudnn6_found = False
  try:
    cudnn = ctypes.WinDLL("cudnn64_6.dll")
    cudnn6_found = True
  except OSError:
    candidate_explanation = True

  if not cudnn5_found or not cudnn6_found:
    print()
    if not cudnn5_found and not cudnn6_found:
      print("- Could not find cuDNN.")
    elif not cudnn5_found:
      print("- Could not find cuDNN 5.1.")
    else:
      print("- Could not find cuDNN 6.")
      print("""
  The GPU version of TensorFlow requires that the correct cuDNN DLL be installed
  in a directory that is named in your %PATH% environment variable. Note that
  installing cuDNN is a separate step from installing CUDA, and it is often
  found in a different directory from the CUDA DLLs. The correct version of
  cuDNN depends on your version of TensorFlow:

  * TensorFlow 1.2.1 or earlier requires cuDNN 5.1. ('cudnn64_5.dll')
  * TensorFlow 1.3 or later requires cuDNN 6. ('cudnn64_6.dll')

  You may install the necessary DLL by downloading cuDNN from this URL:
  https://developer.nvidia.com/cudnn""")

  if not candidate_explanation:
    print("""
- All required DLLs appear to be present. Please open an issue on the
  TensorFlow GitHub page: https://github.com/tensorflow/tensorflow/issues""")

  sys.exit(-1)

if __name__ == "__main__":
  main()

I get

TensorFlow successfully installed.
The installed version of TensorFlow includes GPU support.

Training in python

When start to train in python, I get some warnings but when I searched it, I can seem to ignore these warnings,

curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages	flearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages	flearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1  | time: 2.863s
[2K
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
[A[ATraining Step: 2  | total loss: [1m[32m1.73151[0m[0m | time: 4.523s

Training in cmd

When I train in cmd, I get slighly different messages,

curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages	flearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages	flearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2018-05-13 19:13:07.272665: I T:\src\github	ensorflow	ensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-05-13 19:13:07.749663: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.77GiB
2018-05-13 19:13:07.766329: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:08.295258: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:08.310539: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2018-05-13 19:13:08.317846: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2018-05-13 19:13:08.325655: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-05-13 19:13:09.481654: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:09.492392: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:09.507539: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:929]      0
2018-05-13 19:13:09.514839: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0:   N
2018-05-13 19:13:09.522600: I T:\src\github	ensorflow	ensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1  | time: 2.879s
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
←[A←[ATraining Step: 2  | total loss: ←[1m←[32m1.60460←[0m←[0m | time: 4.542s

Performance while training

while training, CPU is almost always over 90%, while GPU usage is around 0~25%

Thank you for checking out this long post, I can't seem to locate the problem here to start with. Any help will be greatly appreciated,

Nemorior · Accepted Answer

Are you using your own model or some existing code? If you're using your own implementation the low GPU utilization might be due to your specific implementation. Often, the data pipeline is to blame for this. Check out the performance guide for TF (https://www.tensorflow.org/performance/performance_guide) as well as how to efficiently import data using tf.data.Dataset (https://www.tensorflow.org/programmers_guide/datasets).

If it is not the data pipeline then some of your code is probably executed on the CPU, which results in a lot of copying from GPU to CPU and vice versa. Maybe you perform some of the operations with Numpy or something similar, which only runs on CPU?! Also, other common guidelines apply, e.g. try not to use placeholders and feed_dict, etc...

Overall it would be more helpful if we can see your code or if we knew which model you are using if you are using existing code. Without seeing your code it's hard to know what the problem is. Try downloading the MNIST-CNN example from Tensorflow directly and run it on your computer. It should have a GPU-Util of at least 80%. If this is not the case then mos

Tensorflow 1.8 GPU version doesn't seem to use GPU on windows

Answers (2)

Related Questions

Tensorflow 1.8 GPU version doesn&#39;t seem to use GPU on windows

Answers (2)

Related Questions

Tensorflow 1.8 GPU version doesn't seem to use GPU on windows