Reputation: 81
I am trying to use Alexnet CNN from tensorflow, I can train model without any error messages and have access to the tensorboard, and I can test the trained model with no error.
Only problem is, while training, GPU usage stays mostly 0% sometimes fluctuates to 25% but rarely. However all my CPUS are working crazy like over 90%. So I am assuming it is using CPU instead of GPU.
Here is my setup
Windows 8.1 x64
GPU 1070 driver version 3.88
tensorflow-gpu 1.8.0
CUDA toolkit v9.0
cuDNN version 7
I can import tensorflow in python with no error, I ran some tests to see if it is installed correctly,
Test 1
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 625346735515728619
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6911164212
locality {
bus_id: 1
links {
}
}
incarnation: 15764160474642097170
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1"
]
Test 2
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
b'Hello, TensorFlow!'
Test3
import ctypes
import imp
import sys
def main():
try:
import tensorflow as tf
print("TensorFlow successfully installed.")
if tf.test.is_built_with_cuda():
print("The installed version of TensorFlow includes GPU support.")
else:
print("The installed version of TensorFlow does not include GPU support.")
sys.exit(0)
except ImportError:
print("ERROR: Failed to import the TensorFlow module.")
candidate_explanation = False
python_version = sys.version_info.major, sys.version_info.minor
print("\n- Python version is %d.%d." % python_version)
if not (python_version == (3, 5) or python_version == (3, 6)):
candidate_explanation = True
print("- The official distribution of TensorFlow for Windows requires "
"Python version 3.5 or 3.6.")
try:
_, pathname, _ = imp.find_module("tensorflow")
print("\n- TensorFlow is installed at: %s" % pathname)
except ImportError:
candidate_explanation = False
print("""
- No module named TensorFlow is installed in this Python environment. You may
install it using the command `pip install tensorflow`.""")
try:
msvcp140 = ctypes.WinDLL("msvcp140.dll")
except OSError:
candidate_explanation = True
print("""
- Could not load 'msvcp140.dll'. TensorFlow requires that this DLL be
installed in a directory that is named in your %PATH% environment
variable. You may install this DLL by downloading Microsoft Visual
C++ 2015 Redistributable Update 3 from this URL:
https://www.microsoft.com/en-us/download/details.aspx?id=53587""")
try:
cudart64_80 = ctypes.WinDLL("cudart64_80.dll")
except OSError:
candidate_explanation = True
print("""
- Could not load 'cudart64_80.dll'. The GPU version of TensorFlow
requires that this DLL be installed in a directory that is named in
your %PATH% environment variable. Download and install CUDA 8.0 from
this URL: https://developer.nvidia.com/cuda-toolkit""")
try:
nvcuda = ctypes.WinDLL("nvcuda.dll")
except OSError:
candidate_explanation = True
print("""
- Could not load 'nvcuda.dll'. The GPU version of TensorFlow requires that
this DLL be installed in a directory that is named in your %PATH%
environment variable. Typically it is installed in 'C:\Windows\System32'.
If it is not present, ensure that you have a CUDA-capable GPU with the
correct driver installed.""")
cudnn5_found = False
try:
cudnn5 = ctypes.WinDLL("cudnn64_5.dll")
cudnn5_found = True
except OSError:
candidate_explanation = True
print("""
- Could not load 'cudnn64_5.dll'. The GPU version of TensorFlow
requires that this DLL be installed in a directory that is named in
your %PATH% environment variable. Note that installing cuDNN is a
separate step from installing CUDA, and it is often found in a
different directory from the CUDA DLLs. You may install the
necessary DLL by downloading cuDNN 5.1 from this URL:
https://developer.nvidia.com/cudnn""")
cudnn6_found = False
try:
cudnn = ctypes.WinDLL("cudnn64_6.dll")
cudnn6_found = True
except OSError:
candidate_explanation = True
if not cudnn5_found or not cudnn6_found:
print()
if not cudnn5_found and not cudnn6_found:
print("- Could not find cuDNN.")
elif not cudnn5_found:
print("- Could not find cuDNN 5.1.")
else:
print("- Could not find cuDNN 6.")
print("""
The GPU version of TensorFlow requires that the correct cuDNN DLL be installed
in a directory that is named in your %PATH% environment variable. Note that
installing cuDNN is a separate step from installing CUDA, and it is often
found in a different directory from the CUDA DLLs. The correct version of
cuDNN depends on your version of TensorFlow:
* TensorFlow 1.2.1 or earlier requires cuDNN 5.1. ('cudnn64_5.dll')
* TensorFlow 1.3 or later requires cuDNN 6. ('cudnn64_6.dll')
You may install the necessary DLL by downloading cuDNN from this URL:
https://developer.nvidia.com/cudnn""")
if not candidate_explanation:
print("""
- All required DLLs appear to be present. Please open an issue on the
TensorFlow GitHub page: https://github.com/tensorflow/tensorflow/issues""")
sys.exit(-1)
if __name__ == "__main__":
main()
I get
TensorFlow successfully installed.
The installed version of TensorFlow includes GPU support.
Training in python
When start to train in python, I get some warnings but when I searched it, I can seem to ignore these warnings,
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1 | time: 2.863s
[2K
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
[A[ATraining Step: 2 | total loss: [1m[32m1.73151[0m[0m | time: 4.523s
Training in cmd
When I train in cmd, I get slighly different messages,
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2018-05-13 19:13:07.272665: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-05-13 19:13:07.749663: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.77GiB
2018-05-13 19:13:07.766329: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:08.295258: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:08.310539: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-05-13 19:13:08.317846: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-05-13 19:13:08.325655: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-05-13 19:13:09.481654: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:09.492392: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:09.507539: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-05-13 19:13:09.514839: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-05-13 19:13:09.522600: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1 | time: 2.879s
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
←[A←[ATraining Step: 2 | total loss: ←[1m←[32m1.60460←[0m←[0m | time: 4.542s
Performance while training
while training, CPU is almost always over 90%, while GPU usage is around 0~25%
Thank you for checking out this long post, I can't seem to locate the problem here to start with. Any help will be greatly appreciated,
Upvotes: 1
Views: 2142
Reputation: 2400
Unfortunately, Tensorflow+Cuda+CuDNN installation is known to be pain in the ass, especially on Windows.
I would recommend using precompiled wheels from here: https://github.com/fo40225/tensorflow-windows-wheel/
You can pick latest tensorflow, cuda and cudnn versions (not even supported by official TF releases). Also you can utilize AVX2 instructions by selecting appropriate build.
Upvotes: 0
Reputation: 111
Are you using your own model or some existing code? If you're using your own implementation the low GPU utilization might be due to your specific implementation. Often, the data pipeline is to blame for this. Check out the performance guide for TF (https://www.tensorflow.org/performance/performance_guide) as well as how to efficiently import data using tf.data.Dataset (https://www.tensorflow.org/programmers_guide/datasets).
If it is not the data pipeline then some of your code is probably executed on the CPU, which results in a lot of copying from GPU to CPU and vice versa. Maybe you perform some of the operations with Numpy or something similar, which only runs on CPU?! Also, other common guidelines apply, e.g. try not to use placeholders and feed_dict, etc...
Overall it would be more helpful if we can see your code or if we knew which model you are using if you are using existing code. Without seeing your code it's hard to know what the problem is. Try downloading the MNIST-CNN example from Tensorflow directly and run it on your computer. It should have a GPU-Util of at least 80%. If this is not the case then mos
Upvotes: 1