Reputation: 1028

TensorFlow libdevice not found. Why is it not found in the searched path?

Win 10 64-bit 21H1; TF2.5, CUDA 11 installed in environment (Python 3.9.5 Xeus)

I am not the only one seeing this error; see also (unanswered) here and here. The issue is obscure and the proposed resolutions are unclear/don't seem to work (see e.g. here)

Issue Using the TF Linear_Mixed_Effects_Models.ipynb example (download from TensorFlow github here) execution reaches the point of performing the "warm up stage" then throws the error:

InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]

The console contains this output showing that it finds the GPU but XLA initialisation fails to find the - existing! - libdevice in the specified paths

2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

Now the interesting thing is that the paths searched includes "C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin"

the content of that folder includes all the (successfully loaded at TF startup) DLLs, including cudart64_110.dll, dudnn64_8.dll... and of course libdevice.10.bc

Question Since TF says it is searching this location for this file and the file exists there, what is wrong and how do I fix it?

(NB C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 does not exist... CUDA is intalled in the environment; this path must be a best guess for an OS installation)

Info: I am setting the path by

aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath

but I have also set an OS environment variable XLA_FLAGS to the same string value... I don't know which one is actually working yet, but the fact that the console output says it searched the intended path is good enough

Upvotes: 33

Answers (12)

Jinsong Zhang

Reputation: 1

In linux, just creating a symbolic link to the conda lib file in the root directory of the application also worked for me:

ln -s $CONDA_FREFIX/ln -s libdevice.10.bc libdevice.10.bc

Upvotes: 0

Jeff Hansen

Reputation: 57

I was having a similar error:

2024-07-02 14:11:12.392126: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:510] 
Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in 
compilation or runtime failures, if the program we try to run uses routines 
from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-12.3
  /usr/local/cuda
  /home/spotparking/.local/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
  /home/spotparking/.local/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
  .

You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  

For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.

and following the instructions in the error message worked for me.

I first had to install the nvidia-cuda-nvcc by running:

python3 -m pip install nvidia-pyindex
python3 -m pip install nvidia-cuda-nvcc

I then ran this command to find the path to cuda_nvcc:

find / -type d -name "cuda_nvcc" 2>/dev/null

I copied that path, and then I exported the environment variable with:

export XLA_FLAGS=--xla_gpu_cuda_data_dir=/copied/path/to/cuda_nvcc

Upvotes: 1

Paul Weibert

Reputation: 314

I had the same issue on Ubuntu 22.04 using tensorflow in jupyter-lab. The following steps solve the problem:

Copy libdevice.10.bc to working folder (where jupyter lab is started)
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda/

In python code (before training) os.environ["XLA_FLAGS"] = "--xla_gpu_cuda_data_dir=/usr/lib/cuda/"

Upvotes: 1

Yalda

Reputation: 31

In my case I noticed that there was an error regarding to Adam at the final line :

libdevice not found at ./libdevice.10.bc
         [[{{node Adam/StatefulPartitionedCall_88}}]] [Op:__inference_train_function_10134]

I changed this line: from keras.optimizers import Adam

to this: from keras.optimizers.legacy import Adam

and it worked. It was suggested in this link: https://github.com/keras-team/tf-keras/issues/62

There are some other suggestions for this kind of error.

Upvotes: 3

Min Gao

Reputation: 31

i meet the same error with Tensorflow 2.11,CUDA 11.2, cuDNN 8.1.0. because i use conda build the env, so no nvvm directory and no need to export the environment variable and can't use the command nvcc -V, so many suggestions i searched are not suitable for my problem. i solve the error by downgrade tensonflow to 2.10. Use conda install tensorflow=2.10.0 cudatoolkit cudnn to downgrade your tensorflow version and its dependencies. reference:https://github.com/tensorflow/tensorflow/issues/58681

Upvotes: 3

Krzysztof Cichocki

Reputation: 6414

I had same problem on fresh install Ubuntu 24.04 with Nvidia RTX3090 I used instructions from this page: https://www.tensorflow.org/install/pip and I couldn't run model.fit because it gave me the error. Then in attempt to resolve the issue I've installed this driver: NVIDIA-Linux-x86_64-525.105.17.run but it didn't help.

I believe this actually solved the issue:

sudo apt-get install cuda-toolkit

Upvotes: 5

Mauricio Matsumura

Reputation: 51

For those using miniconda just copy the file libdevice.10.bc into the root folder of python application or notebook.

It works here using python=3.9, cudatoolkit=11.2, cudnn=8.1.0, and tensorflow==2.9

Upvotes: 5

Little Train

Reputation: 902

For those using windows and PowerShell, assuming cuda is in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7

The environment can be set as:

$env:XLA_FLAGS="--xla_gpu_cuda_data_dir='C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7'"

Here "''", i.e. nested quotes, is required!

I think this may be the lightest way to deal with this XLA bug.

Upvotes: 5

Nishikanta Parida

Reputation: 501

for Windows user

Step-1

run (as administrator)

conda install -c anaconda cudatoolkit

you can specify the cudatoolkit version as per your installed cudaCNN /supported version ex:conda install -c anaconda cudatoolkit=10.2.89

Step-2

go to the installed conada folder

C:\ProgramData\Anaconda3\Library\bin

Step-3

locate "libdevice.10.bc" ,copy the file

Step-4

create a folder named "nvvm" inside bin

create another folder named "libdevice" inside nvvm

paste the "libdevice.10.bc" file inside "libdevice"

Step-5

go to environmental variables

System variables >New

variable name:

XLA_FLAGS

variable value:

--xla_gpu_cuda_data_dir=C:\ProgramData\Anaconda3\Library\bin

(edit above as per your directory)

Step-6 restart the cmd/virtual env

Upvotes: 14

Brendan Darrer

Reputation: 509

The following worked for me. With error message:

error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice

Firstly I searched for nvvm directory and then verified that libdevice directory existed:

$ find / -type d -name nvvm 2>/dev/null
/usr/lib/cuda/nvvm
$ cd /usr/lib/cuda/nvvm
/usr/lib/cuda/nvvm$ ls
libdevice
/usr/lib/cuda/nvvm$ cd libdevice
/usr/lib/cuda/nvvm/libdevice$ ls
libdevice.10.bc

Then I exported the environment variable:

export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda

as shown by @Insectatorious above. This solved the error and I was able to run the code.

Upvotes: 36

Insectatorious

Reputation: 1335

For linux users, with tensorflow==2.8 add the following environment variable.

XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda-11.4

Upvotes: 2

Julian Moore

Reputation: 1028

The diagnostic information is unclear and thus unhelpful; there is however a resolution

The issue was resolved by providing the file (as a copy) at this path

C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin\nvvm\libdevice\

Note that C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin was the path given to XLA_FLAGS, but it seems it is not looking for the libdevice file there it is looking for the \nvvm\libdevice\ path This means that I can't just set a different value in XLA_FLAGS to point to the actual location of the libdevice file because, to coin a phrase, it's not (just) the file it's looking for.

The debug info earlier:

2021-08-05 08:38:52.889213: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-05 08:38:52.896033: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-05 08:38:52.899128: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-05 08:38:52.902510: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-05 08:38:52.905815: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77]   .

is incorrect insofar as there is no "CUDA" in the search path; and FWIW I think a different error should have been given for searching in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2 since there is no such folder (there's an old V10.0 folder there, but no OS install of CUDA 11)

Until/unless path handling is improved by TensorFlow such file structure manipulation is needed in every new (Anaconda) python environment.

Full thread in TensorFlow forum here

Upvotes: 6

TensorFlow libdevice not found. Why is it not found in the searched path?

Answers (12)

Related Questions