silence_lamb
silence_lamb

Reputation: 377

How to let TensorFlow XLA know the CUDA path

I installed TensorFlow nightly build version via the command pip install tf-nightly-gpu --prefix=/tf/install/path

When I tried to run any XLA example, TensorFlow has error "Unable to find libdevice dir. Using '.' Failed to compile ptx to cubin. Will attempt to let GPU driver compile the ptx. Not found: /usr/local/cuda-10.0/bin/ptxas not found".

So apparently TensorFlow cannot find my CUDA path. In my system, the CUDA is installed in /cm/shared/apps/cuda/toolkit/10.0.130. Since I didn't build TensorFlow from source, by default XLA searches the folder /user/local/cuda-*. But since I do not have this folder, it will issue an error.

Currently my workaround is to create a symbolic link. I checked the TensorFlow source code in tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc. There is a comment in the file "// CUDA location explicitly specified by user via --xla_gpu_cuda_data_dir has highest priority." So how to pass values to this flag? I tried the following two environment variables, but neither of them works:

export XLA_FLAGS="--xla_gpu_cuda_data_dir=/cm/shared/apps/cuda10.0/toolkit/10.0.130/"
export TF_XLA_FLAGS="--xla_gpu_cuda_data_dir=/cm/shared/apps/cuda10.0/toolkit/10.0.130/"

So how to use the flag "--xla_gpu_cuda_data_dir"? Thanks.

Upvotes: 6

Views: 17936

Answers (3)

Antti Rytsölä
Antti Rytsölä

Reputation: 1545

This worked for me.

tensorflow                2.11.0          gpu_py310hf8ff8df_0  
ii  nvidia-dkms-525                 525.105.17-0ubuntu0.22.04.1             amd64        NVIDIA DKMS package
ii  nvidia-driver-525               525.105.17-0ubuntu0.22.04.1             amd64        NVIDIA driver metapackage
nvidia-cuda-toolkit not installed

nVidia T4 @GCE Ubu 22.04LTS min

conda install -c nvidia cuda-nvcc

ln -s /path/to/conda-env/lib/libdevice.10.bc .

I couldn't get the XLA_FLAGS to work

2023-04-21 09:17:00.947644: F tensorflow/compiler/xla/parse_flags_from_env.cc:226] Unknown flags in XLA_FLAGS: -–xla_gpu_cuda_data_dir=/home/rac/fulltf2/fullcuda.env/lib 
 Perhaps you meant to specify these on the TF_XLA_FLAGS envvar?
Aborted (core dumped)

Upvotes: 2

user14653986
user14653986

Reputation: 111

you can run export XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda in terminal

Upvotes: 11

Harry Yoo
Harry Yoo

Reputation: 341

There is a code change for this issue, but not clear how to use. Check here https://github.com/tensorflow/tensorflow/issues/23783

Upvotes: 1

Related Questions