valegians
valegians

Reputation: 940

Tensorflow not running on GPU

I have aldready spent a considerable of time digging around on stack overflow and else looking for the answer, but couldn't find anything

Hi all,

I am running Tensorflow with Keras on top. I am 90% sure I installed Tensorflow GPU, is there any way to check which install I did?

I was trying to do run some CNN models from Jupyter notebook and I noticed that Keras was running the model on the CPU (checked task manager, CPU was at 100%).

I tried running this code from the tensorflow website:

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

And this is what I got:

MatMul: (MatMul): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.783183: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] MatMul: (MatMul)/job:localhost/replica:0/task:0/cpu:0
b: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.784779: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] b: (Const)/job:localhost/replica:0/task:0/cpu:0
a: (Const): /job:localhost/replica:0/task:0/cpu:0
2017-06-29 17:09:38.786128: I c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\common_runtime\simple_placer.cc:847] a: (Const)/job:localhost/replica:0/task:0/cpu:0
[[ 22.  28.]
 [ 49.  64.]]

Which to me shows I am running on my CPU, for some reason.

I have a GTX1050 (driver version 382.53), I installed CUDA, and Cudnn, and tensorflow installed without any problems. I installed Visual Studio 2015 as well since it was listed as a compatible version.

I remember CUDA mentioning something about an incompatible driver being installed, but if I recall correctly CUDA should have installed its own driver.

Edit: I ran theses commands to list the available devices

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

and this is what I get

[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14922788031522107450
]

and a whole lot of warnings like this

2017-06-29 17:32:45.401429: W c:\tf_jenkins\home\workspace\release-win\m\windows\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.

Edit 2

Tried running

pip3 install --upgrade tensorflow-gpu

and I get

Requirement already up-to-date: tensorflow-gpu in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages
Requirement already up-to-date: markdown==2.2.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: html5lib==0.9999999 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: werkzeug>=0.11.10 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: wheel>=0.26 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: bleach==1.5.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: six>=1.10.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: protobuf>=3.2.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: backports.weakref==1.0rc1 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: numpy>=1.11.0 in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from tensorflow-gpu)
Requirement already up-to-date: setuptools in c:\users\xxx\appdata\local\programs\python\python35\lib\site-packages (from protobuf>=3.2.0->tensorflow-gpu)

Solved: Check comments for solution. Thanks to all who helped!

I am new to this, so any help is greatly appreciated! Thank you.

Upvotes: 51

Views: 96768

Answers (8)

pfm
pfm

Reputation: 6328

To check which devices are available to TensorFlow you can use this and see if the GPU cards are available:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

More info

There are also C++ logs available controlled by the TF_CPP_MIN_VLOG_LEVEL env variable, e.g.:

import os
os.environ["TF_CPP_MIN_VLOG_LEVEL"] = "2"

should allow them to be printed when running import tensorflow as tf.

You should see this kind of logs if you use GPU-enabled tensorflow with proper access to the GPU machine:

successfully opened CUDA library libcublas.so.*.* locally
successfully opened CUDA library libcudnn.so.*.*  locally
successfully opened CUDA library libcufft.so.*.*  locally

On the other hand, if there are no CUDA libraries in the system / container, you will see:

Could not find cuda drivers on your machine, GPU will not be used.

and where CUDA are installed, but there is no GPU physically available, TF will import cleanly and error only later, when you run device_lib.list_local_devices() with this:

failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected

Upvotes: 44

Marek Vajda
Marek Vajda

Reputation: 129

If you have problems running Tensorflow in the GPU, you should check if you have good / any versions of CUDA and cuDNN installed.

These versions should be ideally exactly the same as those tested to work by the devs here. For example for tensorflow==2.8.0 you should have CUDA v11.2 and cuDNN v8.1.

Also, you should add CUDA /bin folder and /libnvvp to system PATH.

This answer is based on this tutorial Tensorflow 2021 install tutorial.

Upvotes: 2

mirekphd
mirekphd

Reputation: 6743

You may also have CUDA versions mismatch than needs to be solved one way or the other (downgrading / pinning tensorflow to the latest version supported by your system CUDA is arguably quicker, but only doing the opposite is future-proof).

To verify, check CUDA versions used in your installed Tensorflow package:

>>> import tensorflow as tf
>>> tf.sysconfig.get_build_info()['cuda_version']
'11.8'

... and compare it with the CUDA version installed on the host / in the container / VM:

>>> import os
>>> os.system("nvcc --version")

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
0

More info

When tensorflow imports cleanly (without any warnings), but it detects only CPU on a GPU-equipped machine with CUDA libraries installed, then you may also have a CUDA versions mismatch between the pre-compiled tensorflow package wheel and the system / container-installed versions.

The above CUDA versions mismatch (v11.8 used during Tensorflow compilation vs. v11.2 CUDA compiler installed in the container) resulted in TF without GPU access, despite nvidia-smi loading correctly).

See also: Tensorflow CUDA compatibility table (tested build configurations):

Upvotes: 2

kushagra deep
kushagra deep

Reputation: 522

I ran into a similar problem I had the follwing versions of tensor flow libraries.

tensorboard               2.4.1              pyhd8ed1ab_1    conda-forge
tensorboard-plugin-wit    1.8.0              pyh44b312d_0    conda-forge
tensorflow                2.4.1            py39hf3d152e_0    conda-forge
tensorflow-base           2.4.1            py39h23a8cbf_0    conda-forge
tensorflow-estimator      2.4.0              pyh9656e83_0    conda-forge
tensorflow-gpu            2.4.1                h30adc30_0

The same version of libraries were installed in another machine where it was able to utilise the GPU. The Cuda toolkit version and driver versions were the same in both machines( the machine where it was working and the one where it wasnt).

Turns out the reason was that tensorflow-gpu=2.4.1 is compatible with python version 3.8.10. Changing my python version to 3.8.10 and keeping all other things unchanged worked for me !

Upvotes: 1

Azaria Gebremichael
Azaria Gebremichael

Reputation: 762

If you happen to using Anaconda to manage your environments => uninstall all existing versions of tensorflow

pip uninstall tensorflow
pip3 uninstall tensorflow

Install tensorflow-gpu using conda

conda install tensorflow-gpu

If you don't mind starting from a new environment tho the easiest way to do so without

conda create --name tf_gpu tensorflow-gpu 

creates a new conda environment with the name tf_gpu with tensorflow gpu installed

Upvotes: 2

Sachin Mohan
Sachin Mohan

Reputation: 1413

For me the following worked.

I used conda environment, as python environment meant setting LD_LIBRARY_PATH and installing Cuda manually which is an another mess.

In the mentioned blog, he have installed cudatoolkit and cudann inside conda and then installed tensorflow-gpu later which fixed the problem.

P.S, as far as I read, cudatoolkit and cudann plays huge role in getting your code running on tensorflow-gpu.

Upvotes: 1

QtRoS
QtRoS

Reputation: 1177

It may sound dumb, but try reboot. It helped me and some other folks in GitHub.

Upvotes: 22

CoinsWorth
CoinsWorth

Reputation: 123

I was still having trouble getting GPU support even after correctly installing tensorflow-gpu via pip. My problem was that I had installed tensorflow 1.5, and CUDA 9.1 (the default version Nvidia directs you to), whereas the precompiled tensorflow 1.5 works with CUDA versions <= 9.0. Here is download page on nvidia's site to get the correct CUDA 9.0:

https://developer.nvidia.com/cuda-90-download-archive

Also make sure to update your cuDNN to a version compatible with CUDA 9.0 https://developer.nvidia.com/cudnn https://developer.nvidia.com/rdp/cudnn-download

Upvotes: 10

Related Questions