Reputation: 1028
Win 10 64-bit 21H1; TF2.5, CUDA 11 installed in environment (Python 3.9.5 Xeus)
I am not the only one seeing this error; see also (unanswered) here and here. The issue is obscure and the proposed resolutions are unclear/don't seem to work (see e.g. here)
Issue Using the TF Linear_Mixed_Effects_Models.ipynb example (download from TensorFlow github here) execution reaches the point of performing the "warm up stage" then throws the error:
InternalError: libdevice not found at ./libdevice.10.bc [Op:__inference_one_e_step_2806]
The console contains this output showing that it finds the GPU but XLA initialisation fails to find the - existing! - libdevice in the specified paths
2021-08-01 22:04:36.691300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9623 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-08-01 22:04:37.080007: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
2021-08-01 22:04:54.122528: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x1d724940130 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-01 22:04:54.127766: I tensorflow/compiler/xla/service/service.cc:177] StreamExecutor device (0): NVIDIA GeForce GTX 1080 Ti, Compute Capability 6.1
2021-08-01 22:04:54.215072: W tensorflow/compiler/tf2xla/kernels/random_ops.cc:241] Warning: Using tf.random.uniform with XLA compilation will ignore seeds; consider using tf.random.stateless_uniform instead if reproducible behavior is desired.
2021-08-01 22:04:55.506464: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-01 22:04:55.512876: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-01 22:04:55.517387: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-01 22:04:55.520773: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-01 22:04:55.524125: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] .
2021-08-01 22:04:55.526349: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
Now the interesting thing is that the paths searched includes "C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin"
the content of that folder includes all the (successfully loaded at TF startup) DLLs, including cudart64_110.dll, dudnn64_8.dll... and of course libdevice.10.bc
Question Since TF says it is searching this location for this file and the file exists there, what is wrong and how do I fix it?
(NB C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
does not exist... CUDA is intalled in the environment; this path must be a best guess for an OS installation)
Info: I am setting the path by
aPath = '--xla_gpu_cuda_data_dir=C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin'
print(aPath)
os.environ['XLA_FLAGS'] = aPath
but I have also set an OS environment variable XLA_FLAGS to the same string value... I don't know which one is actually working yet, but the fact that the console output says it searched the intended path is good enough
Upvotes: 33
Views: 47056
Reputation: 1
In linux, just creating a symbolic link to the conda lib file in the root directory of the application also worked for me:
ln -s $CONDA_FREFIX/ln -s libdevice.10.bc libdevice.10.bc
Upvotes: 0
Reputation: 57
I was having a similar error:
2024-07-02 14:11:12.392126: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:510]
Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in
compilation or runtime failures, if the program we try to run uses routines
from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
/usr/local/cuda-12.3
/usr/local/cuda
/home/spotparking/.local/lib/python3.10/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
/home/spotparking/.local/lib/python3.10/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.
For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
and following the instructions in the error message worked for me.
I first had to install the nvidia-cuda-nvcc by running:
python3 -m pip install nvidia-pyindex
python3 -m pip install nvidia-cuda-nvcc
I then ran this command to find the path to cuda_nvcc:
find / -type d -name "cuda_nvcc" 2>/dev/null
I copied that path, and then I exported the environment variable with:
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/copied/path/to/cuda_nvcc
Upvotes: 1
Reputation: 314
I had the same issue on Ubuntu 22.04 using tensorflow in jupyter-lab. The following steps solve the problem:
libdevice.10.bc
to working folder (where jupyter lab is started)export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda/
or
os.environ["XLA_FLAGS"] = "--xla_gpu_cuda_data_dir=/usr/lib/cuda/"
Upvotes: 1
Reputation: 31
In my case I noticed that there was an error regarding to Adam at the final line :
libdevice not found at ./libdevice.10.bc
[[{{node Adam/StatefulPartitionedCall_88}}]] [Op:__inference_train_function_10134]
I changed this line:
from keras.optimizers import Adam
to this:
from keras.optimizers.legacy import Adam
and it worked. It was suggested in this link: https://github.com/keras-team/tf-keras/issues/62
There are some other suggestions for this kind of error.
Upvotes: 3
Reputation: 31
i meet the same error with Tensorflow 2.11,CUDA 11.2, cuDNN 8.1.0. because i use conda
build the env, so no nvvm directory and no need to export the environment variable and can't use the command nvcc -V
, so many suggestions i searched are not suitable for my problem.
i solve the error by downgrade tensonflow to 2.10. Use conda install tensorflow=2.10.0 cudatoolkit cudnn
to downgrade your tensorflow version and its dependencies.
reference:https://github.com/tensorflow/tensorflow/issues/58681
Upvotes: 3
Reputation: 6414
I had same problem on fresh install Ubuntu 24.04 with Nvidia RTX3090 I used instructions from this page: https://www.tensorflow.org/install/pip and I couldn't run model.fit because it gave me the error. Then in attempt to resolve the issue I've installed this driver: NVIDIA-Linux-x86_64-525.105.17.run but it didn't help.
I believe this actually solved the issue:
sudo apt-get install cuda-toolkit
Upvotes: 5
Reputation: 51
For those using miniconda just copy the file libdevice.10.bc into the root folder of python application or notebook.
It works here using python=3.9, cudatoolkit=11.2, cudnn=8.1.0, and tensorflow==2.9
Upvotes: 5
Reputation: 902
For those using windows and PowerShell, assuming cuda is in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7
The environment can be set as:
$env:XLA_FLAGS="--xla_gpu_cuda_data_dir='C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7'"
Here "''"
, i.e. nested quotes, is required!
I think this may be the lightest way to deal with this XLA bug.
Upvotes: 5
Reputation: 501
for Windows user
Step-1
run (as administrator)
conda install -c anaconda cudatoolkit
you can specify the cudatoolkit version as per your installed cudaCNN /supported version ex:conda install -c anaconda cudatoolkit=10.2.89
Step-2
go to the installed conada folder
C:\ProgramData\Anaconda3\Library\bin
Step-3
locate "libdevice.10.bc" ,copy the file
Step-4
create a folder named "nvvm" inside bin
create another folder named "libdevice" inside nvvm
paste the "libdevice.10.bc" file inside "libdevice"
Step-5
go to environmental variables
System variables >New
variable name:
XLA_FLAGS
variable value:
--xla_gpu_cuda_data_dir=C:\ProgramData\Anaconda3\Library\bin
(edit above as per your directory)
Step-6 restart the cmd/virtual env
Upvotes: 14
Reputation: 509
The following worked for me. With error message:
error: Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice
Firstly I searched for nvvm
directory and then verified that libdevice
directory existed:
$ find / -type d -name nvvm 2>/dev/null
/usr/lib/cuda/nvvm
$ cd /usr/lib/cuda/nvvm
/usr/lib/cuda/nvvm$ ls
libdevice
/usr/lib/cuda/nvvm$ cd libdevice
/usr/lib/cuda/nvvm/libdevice$ ls
libdevice.10.bc
Then I exported the environment variable:
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/lib/cuda
as shown by @Insectatorious above. This solved the error and I was able to run the code.
Upvotes: 36
Reputation: 1335
For linux users, with tensorflow==2.8
add the following environment variable.
XLA_FLAGS=--xla_gpu_cuda_data_dir=/usr/local/cuda-11.4
Upvotes: 2
Reputation: 1028
The diagnostic information is unclear and thus unhelpful; there is however a resolution
The issue was resolved by providing the file (as a copy) at this path
C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin\nvvm\libdevice\
Note that C:\Users\Julian\anaconda3\envs\TF250_PY395_xeus\Library\bin
was the path given to XLA_FLAGS, but it seems it is not looking for the libdevice file there it is looking for the \nvvm\libdevice\ path This means that I can't just set a different value in XLA_FLAGS to point to the actual location of the libdevice file because, to coin a phrase, it's not (just) the file it's looking for.
The debug info earlier:
2021-08-05 08:38:52.889213: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2021-08-05 08:38:52.896033: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2021-08-05 08:38:52.899128: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] C:/Users/Julian/anaconda3/envs/TF250_PY395_xeus/Library/bin
2021-08-05 08:38:52.902510: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
2021-08-05 08:38:52.905815: W tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] .
is incorrect insofar as there is no "CUDA" in the search path; and FWIW I think a different error should have been given for searching in C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2
since there is no such folder (there's an old V10.0 folder there, but no OS install of CUDA 11)
Until/unless path handling is improved by TensorFlow such file structure manipulation is needed in every new (Anaconda) python environment.
Full thread in TensorFlow forum here
Upvotes: 6