Reputation: 2149

GPU not found while using TensorFlow 2.0.0

I am migrating to Tensorflow 2.0, I work on

Ubuntu 18.04

CUDA 10.2

Python 3.7

ZOTAC GeForce® GTX 1080 Ti Mini (ZT-P10810G-10P)

When I run nvcc -V and nvidia-smi, I can see the GPU. But the following commands do not list the GPU.

tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
Output: False

tf.config.experimental.list_physical_devices(device_type=None)
Output: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU')]

from keras import backend as K
K.tensorflow_backend._get_available_gpus()
Output: []

Edit 1: On tensorflow gpu support website they have instructions for tensorflow 1.15 and tensorflow 1.14 but not for higher versions.

Also rebooting and re-stalling tensorflow-gpu did not help.

Updating CUDA and cudnn also didn't work.

Upvotes: 0

Answers (3)

Malgo

Reputation: 2149

Yes thank you so much @Ran Fang.

TF 2.0 can work on GPU only with CUDA 10.0 and cuDNN 7.4 - You can check the dependencies here

I did all of the above and my versions currently are CUDA 10.0, cuDNN 7.4.1 and NVIDIA-SMI 410.129. You can check the TF-CUDA dependencies and NVIDIA Drivers - CUDA dependencies.

For me, to check cuDNN version the following command worked on Ubuntu 18.04 -

cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2

But doing all of the above and just installing tensorflow-gpu did not get the GPUs working.

What finally worked was the following - Uninstalling all the tensorflow packages first and then only installing -

pip uninstall tensorflow tf-nightly tensorboard tb-nightly tensorflow-estimator

pip install tf-nightly-gpu-2.0-preview

worked like a charm.

Follow this tutorial to install Tensorflow 2.0 and dependencies accordingly.

You can also go through this, this and this doc to completely uninstall newer CUDA and cuDNN versions and install older CUDA 10.0 version.

Upvotes: 0

Malgo

Reputation: 2149

After doing all of the above, I was getting an error after running model.fit(..) -

UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

I followed the thread here and followed @RadV answer and it worked. His answer -

Upgrad per instruction on this TensorFlow GPU instructions page

So now I have,

Ubuntu 18.04

when I run nvidia-smi in the terminal, it shows CUDA 10.2

Output of which nvcc in the terminal gives /usr/local/cuda-10.0/bin/nvcc

Upvotes: 0

Reine_Ran_

Reputation: 672

From the link https://www.tensorflow.org/install/source#linux, for tensorflow-2.0.0, it requires cuDNN version 7.4 and CUDA 10.0. Not sure what your cuDNN version is (you can check it with:

cat ${CUDNN_H_PATH} | grep CUDNN_MAJOR -A 2

and it should return:

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 5
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

=> this means its 7.5.0

I've read that if the link above specifies 7.4, you can't use any other version (not even 7.5, 7.6 etc). So look for the archived versions on cuDNN downloads.

I suggest you uninstall all your nvidia drivers, tensorflow-gpu, cuDNN, all cuda libraries and toolkits in your /usr/local folder and do a fresh install. Don't install both tensorflow and tensorflow-gpu. Just install the tensorflow-gpu one.

Here's how to uninstall nvidia-cuda-toolkit and its dependencies:

sudo apt-get remove --auto-remove nvidia-cuda-toolkit

Remember to edit your ~/.bash_profile file

After everything is properly uninstalled and purged, install nvidia-driver-418 (I personally use this version, but according to the nvidia docs, for CUDA 10 as long as its 410.xx its okay - nvidia docs tensorflow release notes):

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-driver-418

Then reboot and check with nvidia-smi command that it says: NVIDIA-SMI 430.50 Driver Version 430.50

Then download and install CUDA 10.0

Please don't download any other versions (not 10.1, 10.2, etc - sorry for being naggy)

just remember to select n (no) for installing NVIDIA Accelerated Graphics Driver. There will be an error message saying its incomplete installation but you can ignore it.

make sure ~/.bashrc file includes cuda-10.0

export PATH=/usr/local/cuda-10.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64

Then download cuDNN v7.4.2 for CUDA 10.0. After extracting the tgz file with tar command, cd into the cuda folder and copy the contents of the lib64/ directory and all its contents into the /usr/local/cuda/lib64/ path. Also, copy the include/ folder into the /usr/local/cuda/include/ path like so:

sudo cp -P lib64/* /usr/local/cuda/lib64/
sudo cp -P include/* /usr/local/cuda/include/

Check with

tf.test.is_gpu_available()

and it should return True.

Upvotes: 1

GPU not found while using TensorFlow 2.0.0

Answers (3)

Related Questions