Reputation: 147
I am a beginner when it comes to executing the python code on GPU. I have a CNN code which I would like to run on GPU. I have tensorflow-gpu, CUDA and CUDANN installed on my laptop, but the Python code doesn't execute on GPU.
I will just write here everything that I tried and post the output
Code:
pip freeze | grep tensorflow
Output:
tensorflow==2.0.0
tensorflow-estimator==2.0.0
tensorflow-gpu==2.0.0
Code:
nvcc --version
Output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
Code
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
Output
define CUDNN_MAJOR 7
define CUDNN_MINOR 5
define CUDNN_PATCHLEVEL 0
define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
include "driver_types.h"
Code:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorFlow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Output:
Num GPUs Available: 0
Code
import tensorflow
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Output:
2019-10-16 22:11:15.280922: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-16 22:11:15.484734: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
2019-10-16 22:11:15.508127: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d4c60 executing computations on platform Host. Devices:
2019-10-16 22:11:15.508212: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-10-16 22:11:15.784006: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-16 22:11:15.785226: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x45d6ad0 executing computations on platform CUDA. Devices:
2019-10-16 22:11:15.785278: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1060, Compute Capability 6.1
2019-10-16 22:11:15.785605: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-10-16 22:11:15.786528: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1060 major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.788010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-16 22:11:15.788036: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-16 22:11:15.788073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159]
Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-16 22:11:15.788094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2019-10-16 22:11:15.788111: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7400412130462543104
,name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 10419596086097903998
physical_device_desc: "device: XLA_CPU device"
,name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 10970348491339008844
physical_device_desc: "device: XLA_GPU device"
]
I have referred to several websites which basically says that if you have GPU and tensorflow-gpu installed then the program will automatically detect the GPU and run the code. I also know that there are similar questions on StackOverflow, and the above code is implemented after finding answers to similar question. The official website of tensorflow 2.0
tf.debugging.set_log_device_placement(True)
# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
Output is:
RuntimeError: Device placement logging must be set at program startup
Why is my program not executing on gpu?
Upvotes: 0
Views: 6716
Reputation: 2654
If any of the above doesn't work, try installing tensorflow-gpu
with conda
instead of pip
. For some reason pip install tensorflow-gpu
doesn't work as expected.
conda install tensorflow-gpu
Upvotes: 0
Reputation: 147
Rishabh Sahrawat's answer worked for me. It took me a very long time to figure out how to uninstall CUDA 10.1 and install CUDA 10.0. While this is pretty informative, I was still struggling to get all the installation's right as I was getting package error (sigh), NVIDIA driver error, dpkg error, etc. I thought it would be nice to gather everything in one place and guide others (beginner's like me) who are probably facing the same difficulties. I tried the following command to fix the error and it worked for me. Some of them are already mentioned in the question, but nevertheless I have mentioned it here too. I hope this helps.
1. How to uninstall CUDA?
dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 sudo dpkg --purge --force-all
sudo apt-get remove cuda-*
2. How to check if CUDA is uninstalled/ installed?
Command:
nvcc --version
Output (if uninstalled)
command 'nvcc' not found, but can be installed with sudo apt install nvidia-cuda-toolkit
Output (if installed)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
3. In case of the error bash: /usr/bin/nvcc: No such file or directory
Check the path in .bashrc. One can also refer to this link
4. How to remove NVIDIA driver old version?
Command
sudo apt-get --purge remove "*nvidia*"
5. How to check if the driver is installed?
Command
nvidia-smi
6. In the case of Error message “Sub-process /usr/bin/dpkg returned an error code (1)”
One can also try:
sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev
apt --fix-broken install # (if it doesn't work, try it in root)
7. How to install CUDA?
I used the following command instead of step 4 in CUDA installation
sudo apt-get install cuda-10-0
8. How to install CUDANN?
Download cuDNN Library for Linux
# Unpack the archive
tar -zxvf cudnn-10.0-linux-x64-v7.6.4.38.tgz
# Move the unpacked contents to your CUDA directory
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64/
sudo cp cuda/include/cudnn.h /usr/local/cuda-10.0/include/
# Give read access to all users
sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
One can also use the following links (it did not work for me, but they are worth trying):
Once everything is installed, and tensorflow is uninstalled (just keep tensorflow-gpu), the code will run on GPU
How to ensure tensorflow is using the GPU
Note: if you face an import error while importing tensorflow, I did this and it worked for me
pip uninstall tensorflow
pip uninstall tensorflow-gpu
pip install tensorflow-gpu
Additional information:
1. To check Ubuntu kernel version:
uname -sr
uname -r
uname -a
Enjoy :)
Upvotes: 0
Reputation: 2507
If you look here-
2019-10-16 22:11:15.786826: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787053: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787266: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787474: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787682: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
2019-10-16 22:11:15.787950: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.1/lib64/
It says, it is looking for files with Cuda 10.0
however, what it found are Cuda 10.1
files. So, first step would be to uninstall and remove Cuda 10.1 version and install Cuda 10.0. Also remove tensorflow, and just keep tensorflow-gpu.
For all the other versions follow the exact suggestions here.
Let us know if that solves your issue.
Upvotes: 3