darki73
darki73

Reputation: 1127

TensorFlow ignores the RTX 3000 series GPU

I am trying to train my model using the RTX 3090 GPU.
In order to be able to use it at all, i had to install TensorFlow==2.4.0-rc0, however, there is a problem with actually using that GPU.

(Yes, i have downclocked memory as it is getting really toasty while running at stock 19,5 Ghz, that is why memory bandwidth is 60 Gbps lower)

First of all, it detects GPU but then saying:

tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 871.81GiB/s

Then it says:

Adding visible gpu devices: 0

But a couple of lines below that message, this message is displayed:

Created TensorFlow device 
(/job:localhost/replica:0/task:0/device:GPU:0 with 21821 MB memory) -> 
physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)

And then it just continues to hammer CPU and not actually using GPU at all. The most important part, when training is done purely on CPU, time to complete one epoch is around 80 seconds, however, when GPU is used, it wont be able to complete even a single epoch.

enter image description here

This is the complete text output of my Jupyter Notebook (when it is running)

[I 04:06:47.194 NotebookApp] Kernel started: e4bec12d-3d85-4019-9b5a-67d34a45acfc
[I 04:06:50.799 NotebookApp] Starting buffering for e4bec12d-3d85-4019-9b5a-67d34a45acfc:591585a545fe4d33977dac034060b33c
[I 04:06:51.031 NotebookApp] Kernel restarted: e4bec12d-3d85-4019-9b5a-67d34a45acfc
[I 04:06:51.557 NotebookApp] Restoring connection for e4bec12d-3d85-4019-9b5a-67d34a45acfc:591585a545fe4d33977dac034060b33c
[I 04:06:51.558 NotebookApp] Replaying 3 buffered messages
2020-11-06 04:06:53.766169: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-11-06 04:07:01.412837: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-06 04:07:01.420283: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2020-11-06 04:07:01.438547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-06 04:07:01.438675: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-11-06 04:07:01.450544: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-11-06 04:07:01.450698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-11-06 04:07:01.453610: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-11-06 04:07:01.454496: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-11-06 04:07:01.457436: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-11-06 04:07:01.459702: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-11-06 04:07:01.460296: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-11-06 04:07:01.460439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-11-06 04:07:01.461093: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-06 04:07:01.461751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 871.81GiB/s
2020-11-06 04:07:01.461854: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2020-11-06 04:07:01.462144: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-11-06 04:07:01.462407: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-11-06 04:07:01.462690: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2020-11-06 04:07:01.462941: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2020-11-06 04:07:01.464597: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2020-11-06 04:07:01.464843: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2020-11-06 04:07:01.465087: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2020-11-06 04:07:01.465348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2020-11-06 04:07:01.838515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-06 04:07:01.838596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2020-11-06 04:07:01.838999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2020-11-06 04:07:01.839431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21821 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2020-11-06 04:07:01.842196: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2020-11-06 04:07:10.441807: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2020-11-06 04:07:11.435159: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2020-11-06 04:07:12.026347: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2020-11-06 04:07:12.044635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
[I 04:08:47.169 NotebookApp] Saving file at /train_model.ipynb
2020-11-06 04:13:24.212460: I tensorflow/stream_executor/cuda/cuda_blas.cc:1838] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.

P.S. Update #1
It took 579 seconds to complete single epoch using GPU, while it used to take only 80 seconds to complete it on CPU

Upvotes: 4

Views: 8467

Answers (4)

Rishabh
Rishabh

Reputation: 177

I had the same problem so installed tf-nightly-gpu 2.5.0.dev20210118

Its back to normal.

While running code please ensure that tensorflow 2.5 is being called

To make execution faster you can also use allocate memory

Here since i am using RTX 3090 i am allocating 22GB

gpus = tf.config.experimental.list_physical_devices('GPU')

if gpus:

# Restrict TensorFlow to only allocate 22GB of memory on the first GPU

try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],

[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=22000)])

logical_gpus = tf.config.experimental.list_logical_devices('GPU')

print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:

# Virtual devices must be set before GPUs have been initialized
print(e)

You can refer this documentation from tensorflow regarding GPU tweaks https://www.tensorflow.org/guide/gpu

Upvotes: 0

Sulandir
Sulandir

Reputation: 3

After encountering similar issues myself yesterday, I decided to try some blogposts' [1][2] info and I simply installed the versiosn that are supposedly compatibel with the RTX 3090 (either check the second link or the official compatibility matrix):

  • CUDA 11.1
  • cuDNN 8.4.0.30

I am using Windows 10, and running python 3.8.6 via conda env. I then installed the latest version of the tf snapshot (not the stable version), tf-nightly-gpu=2.5.0.dev20201110. This then caused me to run into similar errors as reported on the tensorflow issue tracker [3]:

Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found

This however was solved by either installing CUDA 10.2 ON TOP of the 11 installation; note that no additional CUDNN version for CUDA 10.2 was installed, CUDA 10.2 is simply installed to provide the missing file (cusolver64_10.dll does not exist in the 11th version, but cusolver64_11.dll does). Some users on the issue implied that moving the missing DLL into the 11.1 version folder (bin) did the trick, but the installation ontop of 11 will do just fine (there are three CUDA paths now in windows, CUDA_PATH is set to the 10th version which can be safely changed back, and then there is a cuda path for each version respectively). TF will try to load the DLLs from the 11.1 version and then look for the missing DLL in a lower versions path.

Does it work? At least I think it does. I am using Tensorflow Keras and Tensorflow to build models with the functional api and my networks are running just fine, producing the expected results. The speedup is the one I expected from my tech leap.

Upvotes: 0

Vedant Joshi
Vedant Joshi

Reputation: 41

It's because rtx 3090 has Ampere architecture and is compatible with Cuda-11 and cuDNN-8 while TensorFlow hasn't covered the requirements of Cuda-11 in v2.3 ...

I'm facing the same issue but I've figured out that it's the compatibility issue, maybe waiting for v2.4 is the best option. or else you can try compiling TensorFlow from source code.

you can refer - https://medium.com/@dun.chwong/the-simple-guide-deep-learning-with-rtx-3090-cuda-cudnn-tensorflow-keras-pytorch-e88a2a8249bc

Upvotes: 4

Theodore Popp
Theodore Popp

Reputation: 856

Adding visible gpu devices: 0 is misleading and actually means one device was added. The portion after the colon is a comma separated list of devices, not the number of devices.

Setting an environment variable TF_CPP_MIN_VLOG_LEVEL=10 will show a lot of information, some of which might help you debug this case.

Given, your logs show that the device was available, cuBLAS libraries were loaded, no other relevant error messages were shown, and there's a very noticeable timing change, the most likely answer is that Tensorflow is not ignoring your GPU and your model is just not optimized to run quickly on GPUs.

My recommended next steps would be looking at VLOGs to see if the GPU is being used for the execution of any ops. It's possible, though I think unlikely, that they would show there are library mismatch issues leading to the CPU still being used and not your GPU along with a slowdown while the process realizes this issue.

After confirming that the GPU is being used, I would advise looking here to confirm all ops you expect to be run on the GPU are and to debug why your model does not work well on a GPU: https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras

Upvotes: 1

Related Questions