Random User
Random User

Reputation: 21

I cannot get Tensorflow 2.0 to work on my GPU

I have been writing programs in Tensorflow on my computer, which uses Linux Mint. For whatever reason I can't get Tensorflow to operate on my GPU.

2021-04-26 15:46:11.462612: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory 2021-04-26 15:46:11.462650: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

I know for a fact that I have CUDA installed, because for PyTorch, the GPU works fine:

mydevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(mydevice)

yields

cuda

Also, I ran a program with tensorflow, and I get:

START TIME:  Mon Apr 26 16:34:24 2021
2021-04-26 16:34:24.499178: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-26 16:34:24.499862: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-26 16:34:24.526372: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-26 16:34:24.526781: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 119.24GiB/s
2021-04-26 16:34:24.526900: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.526986: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.527069: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.528676: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-26 16:34:24.528994: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-26 16:34:24.530990: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-26 16:34:24.531125: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.531230: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.531245: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-04-26 16:34:24.531641: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-26 16:34:24.532140: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-26 16:34:24.532178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-26 16:34:24.532192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      
2021-04-26 16:34:24.592917: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-04-26 16:34:24.593369: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2400000000 Hz

I installed tensorflow on anaconda using conda, though the build is from PyPi, I believe. Please let me know your suggestions. Thank you.

Upvotes: 2

Views: 4122

Answers (2)

David Tang
David Tang

Reputation: 75

It seems from your error logs that tensorflow IS picking up your GPU (GTX 1650). However, the problem is that the cudatoolkit and the cudnn version could be incompatible with your tensorflow version. TF is rather specific with these requirements. The error lines that you need to take note are these:

2021-04-26 16:34:24.526900: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcudart.so.11.0'**; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.526986: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcublas.so.11'**; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.527069: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcublasLt.so.11'**; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.528676: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10

2021-04-26 16:34:24.531125: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcusparse.so.11'**; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-04-26 16:34:24.531230: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library **'libcudnn.so.8'**; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory

The latest tensorflow release tensorflow-2.4.0 (See full table) only plays nicely with : cuDNN 8.0 and CUDA 11.0 versions. (though newer versions of these have already been released -- you might need to check your version, I think you might be using CUDA 10).

I would suggest having a look at this post (older but the commands and principles still apply).


For conda, and creating a new environment for TensorFlow:

  1. Make a yaml file (example yaml file for tensorflow)
  2. Create a new environment for Tensorflow using above yaml file

conda env create -f environment.yml

  1. Activate your new environment

conda activate tensorflow_env_388

Nb. A fresh environment will avoid any conflicting packages.


To troubleshoot and check what is currently installed

conda list cudnn

# packages in environment at /rds/general/user/home/anaconda3/envs/tensorflow_env_388:
#
# Name                    Version                   Build  Channel
cudnn                     7.0.5.39             ha5ca753_1    conda-forge

conda list cudatoolkit

Then cudnn/cuda install as necessary

conda install cudatoolkit=11.0

conda install cudnn=8.0

Upvotes: 1

James
James

Reputation: 36608

What channel did you install it from? If you are using the default channel, you have to specify the GPU version of tensorflow.

conda install tensorflow=2.4.*=gpu* -c anaconda 

Upvotes: 0

Related Questions