Reputation: 2193

tensorflow gpu is only running on CPU

I installed Anaconda-Navigatoron Windows 10 and all necessary Nvidia/Cuda packages, created a new environment called tensorflow-gpu-env, updated PATH information, etc. When I run a model (build by using tensorflow.keras), I see that CPU utilization increases significantly, GPU utilization is 0%, and the model just does not train.

I run a couple of tests to make sure how things look:

print(tf.test.is_built_with_cuda())
True

The above output ('True') looks correct.

Another try:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Output:

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1634313269296444741
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 1478485606
locality {
  bus_id: 1
  links {
  }
}
incarnation: 16493618810057409699
physical_device_desc: "device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0"
]

So far so good... Later in my code, I start the training with the following code:

history = merged_model.fit_generator(generator=train_generator,
                                     epochs=60,
                                     verbose=2,
                                     callbacks=[reduce_lr_on_plateau],
                                     validation_data=val_generator,
                                     use_multiprocessing=True,
                                     max_queue_size=50,
                                     workers=3)

I also tried to run the training as following:

with tf.device('/gpu:0'):
    history = merged_model.fit_generator(generator=train_generator,
                                         epochs=60,
                                         verbose=2,
                                         callbacks=[reduce_lr_on_plateau],
                                         validation_data=val_generator,
                                         use_multiprocessing=True,
                                         max_queue_size=50,
                                         workers=3)

No matter how I start the training, it never starts the training, I keep seeing increased CPU utilization with 0% GPU utilization.

Why is my tensorflow-gpu installation is only using the CPU? Spent HOURS with literally no progress.

ADDENDUM

When I run conda list on the console, I see the following regarding tensorflow:

tensorflow-base           1.11.0          gpu_py36h6e53903_0
tensorflow-gpu            1.11.0                    <pip>

What is this tensorflow-base? Can it cause a problem? Before installing tensorflow-gpu, I made sure that I uninstalled tensorflow and tensorflow-gpu by using both conda and pip; and then installed tensorflow-gpu by using pip. I am not sure if this tensorflow-base came with my tensorflow-gpu installation.

ADDENDUM 2 It looks like tensorflow-base was a part of conda because I could uninstall it with conda uninstall tensorflow-base. I still have tensorflow-gpu installation in place but I now cannot import tensorflow anymore. It says "No module named tensorflow". It looks like my conda environment is not seeing my tensorflor-gpu installation. I am quite confused at the moment.

Upvotes: 3

tensorflow gpu is only running on CPU

Answers (2)

Related Questions