Jeff Boker
Jeff Boker

Reputation: 909

Tensorflow with GPU slower than expected

So I recently tried running tensorflow-gpu on a pc with the following specs:

AMD Ryzen 5 2600X 6 core, NVIDIA GeForce RTX 2060 with 16 GB ram

I ran the built in dataset with Fashion mnist in the tutorial on colab. I ran the following code and noticed that colab did not run on a gpu:

print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

So I went through the tutorial and essentially ran their code:

import tensorflow as tf
import time

print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images  = training_images / 255.0
test_images = test_images / 255.0

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])



start_time = time.time()
model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.fit(training_images, training_labels, epochs=5)

print("My program took {} seconds to run".format(time.time() - start_time))

I noticed that on colab it took ~17 seconds to compile and fit the data. When I ran it on my computer which did detect the GPU, it took ~13 seconds to do the same process. I was under the impression that the GPU I had would be light years faster so I was wondering what the problem is with my setup or if I was using my GPU incorrectly.

Also I am running python 3.7.7, tensorflow version 2.1.0, and keras version 2.2.4-tf.

Upvotes: 6

Views: 4749

Answers (2)

MX-Qulin
MX-Qulin

Reputation: 123

Glad to help!

I think that the problem is that this program used little computing resource. Test my program instead and maybe you'll see the difference between GPU and CPU~

#!/usr/bin/env python
# -*- coding:utf-8 -*-
import sys
import tensorflow as tf
from datetime import datetime
len=10000 # Set shape(bigger, gpu runs faster and cpu runs slower.)
device_name = gpu  # Choose device from cmd line. Options: gpu or cpu
shape = (int(len), int(len))   
if device_name == "gpu":
    device_name = "/gpu:0"
else:
    device_name = "/cpu:0"
 
with tf.device(device_name):
    # tensorflow 1.x use below:
    # random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
    # tensorflow 2.x use below:
    random_matrix = tf.random.uniform(shape=shape, minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)
 
startTime = datetime.now()

# if tf v1.x, use below:
#with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:

# if tf v2.x, use below:
with tf.compat.v1.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
        result = session.run(sum_operation)
        print(result)
 
# It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", datetime.now() - startTime)
print("\n" * 5)

Hope it can help.

Wish you a good day:)

Upvotes: 4

user12128336
user12128336

Reputation:

Explanation: The problem is that the network isn't very big. With smaller networks training on the CPU will work fine and it should be pretty quick meaning there wouldn't be much of an improvement to running it on the GPU. But if your network were a lot larger, the CPU would struggle a much more allowing the GPU to really shine.

Analogy: If you were to do np.zeros([5]) * np.zeros([5]) (multiplies 2 arrays filled with 0s) on the CPU it should take a few microseconds because it is such a simple task and when you did this on the GPU it should also take a similar amount of time. But let's say the arrays had 10,000,000 elements instead of just 5, then a GPU may be noticeably faster.

Your answer: The real improvement only comes once the CPU is being completely utilized. With larger neural networks the matrix multiplication starts becoming very difficult for the CPU to the point where it is being used completely for this one task. And once the neural network is large enough where the CPU is being maxed out then the GPU can become 10 or even 20x faster than the CPU since it can take much higher loads.

But if you would like to train your networks faster across the board, then you should consider training with graph execution. This is a different type of execution than the normal one (eager execution) which is faster (although marginally) and the reason why TF 1 (defaults to graph execution) is usually faster than TF 2. Here is an example of how to use it:

import tensorflow as tf
import time

print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) = mnist.load_data()
training_images  = training_images / 255.0
test_images = test_images / 255.0

graph = tf.Graph()

with graph.as_default():
  model = tf.keras.models.Sequential([tf.keras.layers.Flatten(), 
                                    tf.keras.layers.Dense(128, activation=tf.nn.relu), 
                                    tf.keras.layers.Dense(10, activation=tf.nn.softmax)])



  start_time = time.time()
  model.compile(optimizer = tf.keras.optimizers.Adam(),
              loss = 'sparse_categorical_crossentropy',
              metrics=['accuracy'])
  model.fit(training_images, training_labels, epochs=5)

print("My program took {} seconds to run".format(time.time() - start_time))

During my tests, it shaved off about a second from the eager execution.
Eager execution: 17.3 seconds on avg.
Graph execution: 16.3 seconds on avg

Upvotes: 1

Related Questions