Ben
Ben

Reputation: 6751

Keras with Tensorflow backend does not use GPU

I have installed Keras with the TensorFlow backend following these instructions:

library(keras)
install_keras(tensorflow = "gpu")

The installation went smoothly and I had no error message.

If I type:

k = backend()
sess = k$get_session()
sess$list_devices()

As far as I understand the output, my GPU seems to be recognized:

[[1]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 3277741456357329757)

[[2]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 14524037525637335634)

[[3]]
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 5788527260077506513)

My .profile file looks like this:

export CUDA_HOME=${CUDA_PATH}
export PATH="${CUDA_PATH}/bin:$PATH"
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${CUDA_PATH}/lib64/

I can also list all the Nvidia related packages:

[ben@Solgaleo ~]$ pacman -Qs nvidia*
local/cuda 10.2.89-3
    NVIDIA's GPU programming toolkit
local/cudnn 7.6.5.32-3
    NVIDIA CUDA Deep Neural Network library
local/lib32-nvidia-utils 440.59-1
    NVIDIA drivers utilities (32-bit)
local/libvdpau 1.3-1
    Nvidia VDPAU library
local/libxnvctrl 440.59-1
    NVIDIA NV-CONTROL X extension
local/nvidia 440.59-8
    NVIDIA drivers for linux
local/nvidia-settings 440.59-1
    Tool for configuring the NVIDIA graphics driver
local/nvidia-utils 440.59-1
    NVIDIA drivers utilities
local/nvtop 1.0.0-2
    An htop like monitoring tool for NVIDIA GPUs
local/opencl-nvidia 440.59-1
    OpenCL implemention for NVIDIA

But when I build a Keras model, some library files are not found:

library(keras)
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
model <- keras_model_sequential()
model %>%
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = 0.4) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 10, activation = 'softmax')

Here is the error message:

2020-02-18 13:45:23.530693: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-02-18 13:45:23.609674: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-18 13:45:23.610276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2070 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:09:00.0
2020-02-18 13:45:23.610420: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610508: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610597: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610680: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610761: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.610842: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64/R/lib::/opt/cuda/lib64/:::/lib:/usr/lib/jvm/java-7-openjdk/jre/lib/amd64/server::/opt/cuda/lib64/
2020-02-18 13:45:23.646497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-18 13:45:23.646508: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-02-18 13:45:23.647124: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-18 13:45:23.669292: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3794460000 Hz
2020-02-18 13:45:23.670124: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559b541a72c0 executing computations on platform Host. Devices:
2020-02-18 13:45:23.670138: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2020-02-18 13:45:23.670530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-18 13:45:23.670542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      
2020-02-18 13:45:23.982097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-18 13:45:23.982507: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x559b5423b030 executing computations on platform CUDA. Devices:
2020-02-18 13:45:23.982529: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2070 SUPER, Compute Capability 7.5

And indeed, libcudart.so.10.0 (for instance) cannot be found, because it's not there:

[ben@Solgaleo ~]$ ll /opt/cuda/lib64/libcudart.so*
lrwxrwxrwx 1 root root   20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so -> libcudart.so.10.2.89
lrwxrwxrwx 1 root root   20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10 -> libcudart.so.10.2.89
lrwxrwxrwx 1 root root   20 Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10.2 -> libcudart.so.10.2.89
-rwxr-xr-x 1 root root 498K Dec 31 09:07 /opt/cuda/lib64/libcudart.so.10.2.89

So TensorFlow is looking for version 10.0, while I have the 10.2 installed.

And when training my model, only the CPU is used.

What did I mess up with Keras/TensorFlow installation? How can I fix this?

Edit: Here are the versions of Keras and TensorFlow R packages:

keras_2.2.5.0
tensorflow_2.0.0

Upvotes: 1

Views: 1300

Answers (1)

nickyfot
nickyfot

Reputation: 2019

Adding an a partial answer ti the question based on the comments we discussed (as some of the errors I have no idea how to resolve, but maybe someone can add to this).

On the original question it looks like tensorflow can only see the CPU: CPU : _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 3277741456357329757)

Tensorflow versions are not CUDA version agnostic so you need to ensure they are compatible. TF 2.0 is expecting CUDA 10.0 so you need to always double check. You can upgrade to tf 2.1 and downgrade CUDA to 10.1 by using install_tensorflow(version = "2.1.0") in R and using yaourt cuda-10.1 in arch linux to get the right version of CUDA with all dependencies.

Since CUDA 10.0 you need to install TensorRT dependencies as well to use some acceleration properties (that tensorflow is using); for this you need to download the TensorRT package from NVidia developer downloads (account needed) and install using AUR repository.

On the progbar error, I am not 100% sure as I haven't seen it before, but looks like it could be related to tensorboard, so please make sure you have an appropriate version of this installed as well.

Upvotes: 1

Related Questions