Reputation: 1232
I am trying to use Tensorflow 2.7.0 with GPU, but I am constantly running into the same issue:
2022-02-03 08:32:31.822484: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/username/.cache/pypoetry/virtualenvs/poetry_env/lib/python3.7/site-packages/cv2/../../lib64:/home/username/miniconda3/envs/project/lib/
2022-02-03 08:32:31.822528: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
This issue has already appeared multiple times here & on github. However, the solutions usually proposed to a) download the missing CUDA files, b) downgrade/upgrade to the correct CUDA version, c) set the correct LD_LIBRARY_PATH
.
I have been already using my PC with CUDA-enabled PyTorch, and I did not have a single issue there. My nvidia-smi
returns 11.0 version, which is exactly the only I want to have. Also, if I try to run:
import os
LD_LIBRARY_PATH = '/home/username/miniconda3/envs/project/lib/'
print(os.path.exists(os.path.join(LD_LIBRARY_PATH, "libcudart.so.11.0")))
it returns True
. This is exactly the part of LD_LIBRARY_PATH
from the error message, where Tensorflow, apparently, cannot see the libcudart.so.11.0
(which IS there).
Is there something really obvious that I am missing?
nvidia-smi
output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.156.00 Driver Version: 450.156.00 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
nvcc
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Upvotes: 36
Views: 157445
Reputation: 21
conda install cudatoolkit
conda install cudnn
then it this problem resolved
Upvotes: 1
Reputation: 63
I have met a similar problem:
temp_can/libtorch/lib/libtorch_cuda.so: undefined reference to `[email protected]'
when I am trying this example: INSTALLING C++ DISTRIBUTIONS OF PYTORCH I later find that my CUDA version is 11.7, but in the official pytorch website they provide only for 11.8!
Then I go to the link they provided: https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.1.2%2Bcu118.zip And try to modify the 118 into 117, and step by step, I find a version that suits my need: https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-2.0.1%2Bcu117.zip Then I tried again, and the libcudart.so.11.0 problem disappear. So my suggestion is, you could try again to check whether your CUDA version matches!
Upvotes: 0
Reputation: 1
Try adding ‘/usr/local/cuda/lib64’ to the file: /etc/ld.so.conf.d/cuda.conf and then run ‘sudo ldconfig’
Upvotes: 0
Reputation: 1
By default, TensorFlow enables GPU acceleration. When recognizing complex images, TensorFlow uses GPU acceleration without checking whether the device has a GPU. Instead, it loads relevant dynamic libraries, such as libcudart.so.11.0, which may not exist in the local computer and cause errors.
If you have a GPU, please refer to other answers.
If you don't have a GPU, you can modify the default configuration and disable GPU acceleration.
Solution 1:
# Set the environment variable
export CUDA_VISIBLE_DEVICES=-1
Solution 2 (recommended):
# specify that TensorFlow performs computations using the CPU
import os
os.environ['TF_ENABLE_MLIR_OPTIMIZATIONS'] = '1'
Upvotes: -3
Reputation:
Firstly: Can you find out where the "libcudart.so.11.0" is If you lost it at error stack, you can replace the "libcudart.so.11.0" by your word in below:
sudo find / -name 'libcudart.so.11.0'
Outputs in my system. This result shows where the "libcudart.so.11.0" is in the system:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0
If the result shows nothing, please make sure you have install cuda or other staff that must install in your system.
Second, add the path to environment file.
# edit /etc/profile
sudo vim /etc/profile
# append path to "LD_LIBRARY_PATH" in profile file
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/targets/x86_64-linux/lib
# make environment file work
source /etc/profile
You may also refer to this link
Third thing you may try is:
conda install cudatoolkit
Upvotes: 25
Reputation: 425
Faced the same issue with tensorflow 2.9 and cuda 11.7 on arch linux x86_64 with 2 nvidia gpus (1080ti / titan rtx) and solved it:
It is not absolutely necessary to respect the compatibility matrix (cuda 11.7 vs 11.2 so minor superior version). But python 3 version was downgraded according to the tensorflow comp matrix (3.10 to 3.7). Note that you can have multiple cuda version installed and manage it by symlink on linux. (win should be different a bit)
setup with conda and python 3.7
sudo pacman -S base-devel cudnn
conda activate tf-2.9
conda uninstall cudatoolkit && conda install cudnn
I've also had to update gcc for another lib (out of topic)
conda install -c conda-forge gcc=12.1.0
added the snippet for debug according to tf-gpu docs
import tensorflow as tf
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
I now see 2 gpu detected instead of 0, training time is divided by 10.
nvidia-smi
reports ram usage maxed and power level raised from 9W to 150W validating the usage of the gpu (the other was left idle).
Rootcause: cudnn was not installed system-wide.
Upvotes: 2
Reputation: 1088
Installing the correct version of cuda 11.3 and cudnn 8.2.1 for tf2.8. Based on this blog https://www.tensorflow.org/install/source#gpu using following commands.
Then exporting LD path - dynamic link loader path after finding location by
this sudo find / -name 'libcudnn'
System was able to find required libraries and use GPU for training.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/usr/miniconda3/envs/tf2/lib/
Hope it helped.
Upvotes: 9