Valeria
Valeria

Reputation: 1232

Could not load dynamic library 'libcudart.so.11.0';

I am trying to use Tensorflow 2.7.0 with GPU, but I am constantly running into the same issue:

2022-02-03 08:32:31.822484: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/username/.cache/pypoetry/virtualenvs/poetry_env/lib/python3.7/site-packages/cv2/../../lib64:/home/username/miniconda3/envs/project/lib/
2022-02-03 08:32:31.822528: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

This issue has already appeared multiple times here & on github. However, the solutions usually proposed to a) download the missing CUDA files, b) downgrade/upgrade to the correct CUDA version, c) set the correct LD_LIBRARY_PATH.

I have been already using my PC with CUDA-enabled PyTorch, and I did not have a single issue there. My nvidia-smi returns 11.0 version, which is exactly the only I want to have. Also, if I try to run:

import os
LD_LIBRARY_PATH = '/home/username/miniconda3/envs/project/lib/'
print(os.path.exists(os.path.join(LD_LIBRARY_PATH, "libcudart.so.11.0")))

it returns True. This is exactly the part of LD_LIBRARY_PATH from the error message, where Tensorflow, apparently, cannot see the libcudart.so.11.0 (which IS there).

Is there something really obvious that I am missing?

nvidia-smi output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.156.00   Driver Version: 450.156.00   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+

nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Upvotes: 36

Views: 157445

Answers (7)

xiaoying jiang
xiaoying jiang

Reputation: 21

conda install cudatoolkit
conda install cudnn

then it this problem resolved

Upvotes: 1

xiaoming Li
xiaoming Li

Reputation: 63

I have met a similar problem:

temp_can/libtorch/lib/libtorch_cuda.so: undefined reference to `[email protected]'

when I am trying this example: INSTALLING C++ DISTRIBUTIONS OF PYTORCH I later find that my CUDA version is 11.7, but in the official pytorch website they provide only for 11.8!

Then I go to the link they provided: https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.1.2%2Bcu118.zip And try to modify the 118 into 117, and step by step, I find a version that suits my need: https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-2.0.1%2Bcu117.zip Then I tried again, and the libcudart.so.11.0 problem disappear. So my suggestion is, you could try again to check whether your CUDA version matches!

Upvotes: 0

wangwei
wangwei

Reputation: 1

Try adding ‘/usr/local/cuda/lib64’ to the file: /etc/ld.so.conf.d/cuda.conf and then run ‘sudo ldconfig’

Upvotes: 0

god1000
god1000

Reputation: 1

By default, TensorFlow enables GPU acceleration. When recognizing complex images, TensorFlow uses GPU acceleration without checking whether the device has a GPU. Instead, it loads relevant dynamic libraries, such as libcudart.so.11.0, which may not exist in the local computer and cause errors.

If you have a GPU, please refer to other answers.

If you don't have a GPU, you can modify the default configuration and disable GPU acceleration.

Solution 1:

# Set the environment variable
export CUDA_VISIBLE_DEVICES=-1

Solution 2 (recommended):

# specify that TensorFlow performs computations using the CPU
import os
os.environ['TF_ENABLE_MLIR_OPTIMIZATIONS'] = '1'

Upvotes: -3

user11530462
user11530462

Reputation:

Firstly: Can you find out where the "libcudart.so.11.0" is If you lost it at error stack, you can replace the "libcudart.so.11.0" by your word in below:

sudo find / -name 'libcudart.so.11.0'

Outputs in my system. This result shows where the "libcudart.so.11.0" is in the system:

/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudart.so.11.0

If the result shows nothing, please make sure you have install cuda or other staff that must install in your system.

Second, add the path to environment file.

# edit /etc/profile
sudo vim /etc/profile
# append path to "LD_LIBRARY_PATH" in profile file
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/targets/x86_64-linux/lib
# make environment file work
source /etc/profile

You may also refer to this link

Third thing you may try is:

conda install cudatoolkit

Upvotes: 25

ETdecode
ETdecode

Reputation: 425

Faced the same issue with tensorflow 2.9 and cuda 11.7 on arch linux x86_64 with 2 nvidia gpus (1080ti / titan rtx) and solved it:

It is not absolutely necessary to respect the compatibility matrix (cuda 11.7 vs 11.2 so minor superior version). But python 3 version was downgraded according to the tensorflow comp matrix (3.10 to 3.7). Note that you can have multiple cuda version installed and manage it by symlink on linux. (win should be different a bit)

setup with conda and python 3.7

  • sudo pacman -S base-devel cudnn
  • conda activate tf-2.9
  • conda uninstall cudatoolkit && conda install cudnn

I've also had to update gcc for another lib (out of topic)

  • conda install -c conda-forge gcc=12.1.0

added the snippet for debug according to tf-gpu docs

import tensorflow as tf
tf.config.list_physical_devices('GPU')
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

I now see 2 gpu detected instead of 0, training time is divided by 10. nvidia-smi reports ram usage maxed and power level raised from 9W to 150W validating the usage of the gpu (the other was left idle).

Rootcause: cudnn was not installed system-wide.

Upvotes: 2

Vinay Verma
Vinay Verma

Reputation: 1088

Installing the correct version of cuda 11.3 and cudnn 8.2.1 for tf2.8. Based on this blog https://www.tensorflow.org/install/source#gpu using following commands.

  • conda uninstall cudatoolkit
  • conda install cudnn

Then exporting LD path - dynamic link loader path after finding location by this sudo find / -name 'libcudnn' System was able to find required libraries and use GPU for training.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/usr/miniconda3/envs/tf2/lib/

Hope it helped.

Upvotes: 9

Related Questions