Rao208
Rao208

Reputation: 147

Why is the Python code not implementing on GPU? Tensorflow-gpu, CUDA, CUDANN installed

I am a beginner when it comes to executing the python code on GPU. I have a CNN code which I would like to run on GPU. I have tensorflow-gpu, CUDA and CUDANN installed on my laptop, but the Python code doesn't execute on GPU.

nvidia-smi

I will just write here everything that I tried and post the output

  1. Code:

    pip freeze | grep tensorflow
    

    Output:

    tensorflow==2.0.0
    tensorflow-estimator==2.0.0
    tensorflow-gpu==2.0.0
    
  2. Code:

    nvcc --version
    

    Output:

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Fri_Feb__8_19:08:17_PST_2019
    Cuda compilation tools, release 10.1, V10.1.105
    
  3. Code

    cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
    

    Output

    define CUDNN_MAJOR 7
    define CUDNN_MINOR 5
    define CUDNN_PATCHLEVEL 0
    define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
    include "driver_types.h"
    
  4. Code:

    from __future__ import absolute_import, division, print_function, unicode_literals
    import tensorFlow as tf
    
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
    

    Output:

    Num GPUs Available:  0
    
  5. Code

    import tensorflow
    from tensorflow.python.client import device_lib
    print(device_lib.list_local_devices())
    

    Output:

    2019-10-16 22:11:15.280922: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2019-10-16 22:11:15.484734: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
    2019-10-16 22:11:15.508127: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d4c60 executing computations on platform Host. Devices:
    2019-10-16 22:11:15.508212: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
    2019-10-16 22:11:15.784006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-10-16 22:11:15.785226: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d6ad0 executing computations on platform CUDA. Devices:
    2019-10-16 22:11:15.785278: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1
    2019-10-16 22:11:15.785605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-10-16 22:11:15.786528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
    name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
    pciBusID: 0000:01:00.0
    2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
    2019-10-16 22:11:15.788010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2019-10-16 22:11:15.788036: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
    Skipping registering GPU devices...
    2019-10-16 22:11:15.788073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] 
    Device interconnect StreamExecutor with strength 1 edge matrix:
    2019-10-16 22:11:15.788094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 
    2019-10-16 22:11:15.788111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N 
    [name: "/device:CPU:0"
    device_type: "CPU"
    memory_limit: 268435456
    locality {
    }
    incarnation: 7400412130462543104
    ,name: "/device:XLA_CPU:0"
    
    device_type: "XLA_CPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 10419596086097903998
    physical_device_desc: "device: XLA_CPU device"
    ,name: "/device:XLA_GPU:0"
    device_type: "XLA_GPU"
    memory_limit: 17179869184
    locality {
    }
    incarnation: 10970348491339008844
    physical_device_desc: "device: XLA_GPU device"
    ]
    

I have referred to several websites which basically says that if you have GPU and tensorflow-gpu installed then the program will automatically detect the GPU and run the code. I also know that there are similar questions on StackOverflow, and the above code is implemented after finding answers to similar question. The official website of tensorflow 2.0

tf.debugging.set_log_device_placement(True)

# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)

print(c)

Output is:

RuntimeError: Device placement logging must be set at program startup

Why is my program not executing on gpu?

Upvotes: 0

Views: 6716

Answers (3)

salhin
salhin

Reputation: 2654

If any of the above doesn't work, try installing tensorflow-gpu with conda instead of pip. For some reason pip install tensorflow-gpu doesn't work as expected.

conda install tensorflow-gpu

Upvotes: 0

Rao208
Rao208

Reputation: 147

Rishabh Sahrawat's answer worked for me. It took me a very long time to figure out how to uninstall CUDA 10.1 and install CUDA 10.0. While this is pretty informative, I was still struggling to get all the installation's right as I was getting package error (sigh), NVIDIA driver error, dpkg error, etc. I thought it would be nice to gather everything in one place and guide others (beginner's like me) who are probably facing the same difficulties. I tried the following command to fix the error and it worked for me. Some of them are already mentioned in the question, but nevertheless I have mentioned it here too. I hope this helps.

1. How to uninstall CUDA?

dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 sudo dpkg --purge --force-all
sudo apt-get remove cuda-*

2. How to check if CUDA is uninstalled/ installed?

Command:

nvcc --version

Output (if uninstalled)

command 'nvcc' not found, but can be installed with sudo apt install nvidia-cuda-toolkit

Output (if installed)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

3. In case of the error bash: /usr/bin/nvcc: No such file or directory

Check the path in .bashrc. One can also refer to this link

4. How to remove NVIDIA driver old version?

Command

sudo apt-get --purge remove "*nvidia*"

5. How to check if the driver is installed?

Command

nvidia-smi

6. In the case of Error message “Sub-process /usr/bin/dpkg returned an error code (1)”

dpkg error

One can also try:

sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev
apt --fix-broken install # (if it doesn't work, try it in root)

7. How to install CUDA?

I used the following command instead of step 4 in CUDA installation

sudo apt-get install cuda-10-0

8. How to install CUDANN?

Download cuDNN Library for Linux

# Unpack the archive

tar -zxvf cudnn-10.0-linux-x64-v7.6.4.38.tgz

# Move the unpacked contents to your CUDA directory

sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64/
sudo cp  cuda/include/cudnn.h /usr/local/cuda-10.0/include/

# Give read access to all users

sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

One can also use the following links (it did not work for me, but they are worth trying):

  1. I ended up installing CUDA 10.1 by following the steps in the link.
  2. I could not create a new file, /etc/profile.d/cuda.sh as suggested in this link
  3. This link is good too.

Once everything is installed, and tensorflow is uninstalled (just keep tensorflow-gpu), the code will run on GPU

How to ensure tensorflow is using the GPU

Note: if you face an import error while importing tensorflow, I did this and it worked for me

pip uninstall tensorflow
pip uninstall tensorflow-gpu

pip install tensorflow-gpu

Additional information:

1. To check Ubuntu kernel version:

uname -sr
uname -r
uname -a

2. To install the GCC

Enjoy :)

Upvotes: 0

Rishabh Sahrawat
Rishabh Sahrawat

Reputation: 2507

If you look here-

2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/

It says, it is looking for files with Cuda 10.0 however, what it found are Cuda 10.1 files. So, first step would be to uninstall and remove Cuda 10.1 version and install Cuda 10.0. Also remove tensorflow, and just keep tensorflow-gpu. For all the other versions follow the exact suggestions here.

Let us know if that solves your issue.

Upvotes: 3

Related Questions