Keras with Tensorflow backend does not use GPU

Question

I have installed Keras with the TensorFlow backend following these instructions:

library(keras)
install_keras(tensorflow = "gpu")

The installation went smoothly and I had no error message.

If I type:

k = backend()
sess = k$get_session()
sess$list_devices()

As far as I understand the output, my GPU seems to be recognized:

[[1]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 3277741456357329757)

[[2]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 14524037525637335634)

[[3]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 5788527260077506513)

My .profile file looks like this:

export CUDA_HOME=${CUDA_PATH}
export PATH="${CUDA_PATH}/bin:$PATH"
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_PATH}/lib64/

I can also list all the Nvidia related packages:

[ben@Solgaleo ~]$ pacman -Qs nvidia*
local/cuda 10.2.89-3
    NVIDIA's GPU programming toolkit
local/cudnn 7.6.5.32-3
    NVIDIA CUDA Deep Neural Network library
local/lib32-nvidia-utils 440.59-1
    NVIDIA drivers utilities (32-bit)
local/libvdpau 1.3-1
    Nvidia VDPAU library
local/libxnvctrl 440.59-1
    NVIDIA NV-CONTROL X extension
local/nvidia 440.59-8
    NVIDIA drivers for linux
local/nvidia-settings 440.59-1
    Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 440.59-1
    NVIDIA drivers utilities
local/nvtop 1.0.0-2
    An htop like monitoring tool for NVIDIA GPUs
local/opencl-nvidia 440.59-1
    OpenCL implemention for NVIDIA

But when I build a Keras model, some library files are not found:

library(keras)
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
model <- keras_model_sequential()
model %>%
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = 0.4) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 10, activation = 'softmax')

Here is the error message:

2020-02-18 13:45:23.530693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-18 13:45:23.609674: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-18 13:45:23.610276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:09:00.0
2020-02-18 13:45:23.610420: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610508: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610597: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610680: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610761: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610842: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.646497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-18 13:45:23.646508: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-02-18 13:45:23.647124: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-18 13:45:23.669292: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3794460000 Hz
2020-02-18 13:45:23.670124: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559b541a72c0 executing computations on platform Host. Devices:
2020-02-18 13:45:23.670138: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2020-02-18 13:45:23.670530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-18 13:45:23.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      
2020-02-18 13:45:23.982097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-18 13:45:23.982507: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559b5423b030 executing computations on platform CUDA. Devices:
2020-02-18 13:45:23.982529: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5

And indeed, libcudart.so.10.0 (for instance) cannot be found, because it's not there:

[ben@Solgaleo ~]$ ll /opt/cuda/lib64/libcudart.so*
lrwxrwxrwx 1 root root   20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so -> libcudart.so.10.2.89
lrwxrwxrwx 1 root root   20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10 -> libcudart.so.10.2.89
lrwxrwxrwx 1 root root   20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10.2 -> libcudart.so.10.2.89
-rwxr-xr-x 1 root root 498K Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10.2.89

So TensorFlow is looking for version 10.0, while I have the 10.2 installed.

And when training my model, only the CPU is used.

What did I mess up with Keras/TensorFlow installation? How can I fix this?

Edit: Here are the versions of Keras and TensorFlow R packages:

keras_2.2.5.0
tensorflow_2.0.0

Keras with Tensorflow backend does not use GPU

Answers (1)

Related Questions