theastronomist
theastronomist

Reputation: 1056

Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory

I just updated my graphics cards drives with

sudo apt install nvidia-driver-470
sudo apt install cuda-drivers-470

I decided to install them in this manner because they were being held back when trying to sudo apt upgrade. I mistakenly then did sudo apt autoremove to cleanup old packages. After restarting my computer for new drivers to get setup properly, I could no longer use GPU acceleration with tensorflow.

import tensorflow as tf
tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 16:52:01.771391: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 16:52:01.807283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 16:52:01.807973: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808017: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.808048: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856391: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.856466: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-12-07 16:52:01.857601: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
False

Upvotes: 9

Views: 51757

Answers (3)

BSQ
BSQ

Reputation: 955

Based on what has been experienced during the setup of the environment for these projects, please install the official libraries from Nvidia at the following links:

Cuda Toolkit

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_network

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-6-local_12.6.2-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-6

Cudnn

https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

wget https://developer.download.nvidia.com/compute/cudnn/9.5.0/local_installers/cudnn-local-repo-ubuntu2204-9.5.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2204-9.5.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-9.5.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn-cuda-12

TensorRt

https://developer.nvidia.com/tensorrt/download/10x https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html#trt_10 https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-1050/install-guide/index.html

wget https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.5.0/local_repo/nv-tensorrt-local-repo-ubuntu2204-10.5.0-cuda-12.6_1.0-1_amd64.deb
sudo dpkg -i nv-tensorrt-local-repo-ubuntu2204-10.5.0-cuda-12.6_1.0-1_amd64.deb
sudo cp /var/nv-tensorrt-local-repo-ubuntu2204-10.5.0-cuda-12.6/*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install tensorrt

This installation will help ensure that you don't miss anything later on.

Upvotes: 0

Bruno Laporais Pereira
Bruno Laporais Pereira

Reputation: 274

Have you installed cuda-toolkit? The error indicates that version 11 of the libraries is not found. The problem is that the cudatoolkit and the cudnn version may be incompatible with your tensorflow version.

If you already installed the correct version of the toolkit, go directly to Step 5. (You can check the version with the command nvcc --version).

  1. Download the installer from https://developer.nvidia.com/cuda-11-4-4-download-archive?target_os=Linux (this version is compatible with the driver nvidia-470 you installed). The next steps are specific to the runfile option.

  2. As you already installed nvidia-drivers, press Continue if this message appears.

    enter image description here

  3. Accept the terms.

    enter image description here

  4. Again, as you already installed the drivers, just disable the Driver option and press Install.

    enter image description here

  5. Now you need to configure the paths for binaries and libraries. Using find command search for nvcc and libcublas.so.*:

    sudo find / -name 'nvcc'  # Path to binaries
    sudo find / -name 'libcublas.so.*'  # Path to libraries
    
  6. Finally, add the next lines at the end of file ~/.profile according to the paths you found above. Cuda was installed on /usr/local/cuda-11.4 in my system.

    if [ -d "/usr/local/cuda-11.4" ]; then
        PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}
        LD_LIBRARY_PATH=/usr/local/cuda-11.4/targets/x86_64-linux/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
    fi
    

If updating ~\.profile doesn't work, try to update .bashrc or .zshrc (in case you use zsh instead of bash).

  1. Restart the computer.

Upvotes: 13

theastronomist
theastronomist

Reputation: 1056

You can create symlinks inside of /usr/lib/x86_64-linux-gnu directory. I found it by:

$ whereis libcudart
libcudart: /usr/lib/x86_64-linux-gnu/libcudart.so /usr/share/man/man7/libcudart.7.gz

Within this folder you can find other versions of those cuda libraries. Then create symlinks like this. Your specific version that you are linking to might be slightly different.

$ sudo ln -s libcublas.so.10.2.1.243 libcublas.so.11
$ sudo ln -s libcublasLt.so.10.2.1.243 libcublasLt.so.11
$ sudo ln -s libcusolver.so.10.2.0.243 libcusolver.so.11
$ sudo ln -s libcusparse.so.10.3.0.243 libcusparse.so.11

Now your GPU should be detected.

import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2021-12-07 17:07:26.914296: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 17:07:26.950731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.029687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.030421: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325218: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.325642: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:939] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-07 17:07:27.326408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /device:GPU:0 with 9280 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:06:00.0, compute capability: 8.6
True

This method works because these cuda libraries are similar enough that even NVIDIA build them with symlinks often. If tensorflow is looking for libcublas.so.11, you can create a file with that name that just points to another version of libcublas that is already installed.

Upvotes: 6

Related Questions