Nyxeria
Nyxeria

Reputation: 383

Get tensorflow and keras to run on GPU

I'm trying to get my model to train on GPU, but seem to have problem in doing so.

My os is Windows 10

I'm running Python 3.8

I installed tensorflow-gpu==2.2.0rc3 using pip3.8

I followed the instructions at https://www.tensorflow.org/install/gpu and now have newest Nvidia drivers (something like 455), CUDA 10.1, cuDNN 7.6.5 (for CUDA 10.1). My GPU is Nvidia Geforce GTX 1080Ti (so should be compatible with cuda) and CPU is AMD Threadripper 1950

I set path variable as stated in the instructions.

I run following:

import tensorflow as tf
tf.python.client.device_lib.list_local_devices()

and get following output:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15407979308826898993
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 5613254095321737619
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 11129728864475112108
physical_device_desc: "device: XLA_GPU device"
]

So I clearly have some "XLA_GPU" in there somewhere.

Now I have a program that has been tested to be working on CPU (Python 3.6, but using tensorflow 2.x non gpu version). I try running it in the new system, and it runs OK, only that the GPU doesn't seem to be in use. CPU lights up in task manager to ~10%, and GPU doesn't seem to do anything.

I've removed the data loading etc unimportant stuff from the following code, as I know this code runs, only that it runs on wrong device:

with tf.device("/GPU:0"):
    model = Sequential()
    model.add(Dense(units=30, activation=keras.layers.LeakyReLU(alpha=0.1), input_dim = data_input_size)
    model.add(Dense(units=5, activation=keras.layers.LeakyReLU(alpha=0.1)))
    model.add(Dense(units=5, activation=keras.layers.LeakyReLU(alpha=0.1)))
    model.add(Dense(units=2, activation="softmax"))


    model.compile(loss='categorical_crossentropy',
              metrics=['accuracy'])

    model.fit(training_inputs, training_outputs, epochs=1000, batch_size=2, verbose=2)


loss_and_metrics = model.evaluate(validation_inputs, validation_outputs, batch_size=8, verbose=0)
print(loss_and_metrics)

Trying to set the device to "/XLA_GPU:0" or similar strings results in crashes (Unknown attribute: 'XLA_GPU' in '/XLA_GPU:0').

So the question is, what am I doing wrong?

Upvotes: 0

Views: 3926

Answers (1)

y.selivonchyk
y.selivonchyk

Reputation: 9900

Tensorflow is often picky about the python version. Try downgrading to python 3.7 (or any other version listed as supported).

Also, you can try using conda, which would help to utilize specific python version and would probably have a bit more convenient way to handle CUDA/CUDNN dependencies:

conda create -n tensorflow_gpu pip python=3.7
activate tensorflow_gpu
conda install tensorflow

Upvotes: 1

Related Questions