Laci Szakács
Laci Szakács

Reputation: 31

Low GPU usage when training a CNN

I just installed tensorflow gpu and I started to train my convolutional neural network. The problem is that my gpu usage percentage is constantly at 0% and sometimes it increases until 20%. The CPU is somewhere at 20% and the disk above 60%. I tried to test if I installed it correctly and I done some matrix multiplications, in that case, everything was allright and the GPU usage was above 90%.

with tf.device("/gpu:0"):
    #here I set up the computational graph

when I run the graph I use this, so the compiler will decide if one operation has a gpu implementation or not

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:

I have an NVIDIA GEFORCE GTX 950m graphic card and I don't get errors at runtime. What am I doing wrong?

later edit, my computation graph

with tf.device("/gpu:0"):
    X = tf.placeholder(tf.float32, shape=[None, height, width, channels], name="X")
    dropout_rate= 0.3

    training = tf.placeholder_with_default(False, shape=(), name="training")
    X_drop = tf.layers.dropout(X, dropout_rate, training = training)

    y = tf.placeholder(tf.int32, shape = [None], name="y")

    conv1 = tf.layers.conv2d(X_drop, filters=32, kernel_size=3,
                            strides=1, padding="SAME",
                            activation=tf.nn.relu, name="conv1")

    conv2 = tf.layers.conv2d(conv1, filters=64, kernel_size=3,
                            strides=2, padding="SAME",
                            activation=tf.nn.relu, name="conv2")

    pool3 = tf.nn.max_pool(conv2,
                            ksize=[1, 2, 2, 1],
                            strides=[1, 2, 2, 1],

    conv4 = tf.layers.conv2d(pool3, filters=128, kernel_size=4,
                            strides=3, padding="SAME",
                            activation=tf.nn.relu, name="conv4")

    pool5 = tf.nn.max_pool(conv4,
                            ksize=[1, 2, 2, 1],
                            strides=[1, 1, 1, 1],

    pool5_flat = tf.reshape(pool5, shape = [-1, 128*2*2])

    fullyconn1 = tf.layers.dense(pool5_flat, 128, activation=tf.nn.relu, name = "fc1")
    fullyconn2 = tf.layers.dense(fullyconn1, 64, activation=tf.nn.relu, name = "fc2")

    logits = tf.layers.dense(fullyconn2, 2, name="output")

    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y)

    loss = tf.reduce_mean(xentropy)
    optimizer = tf.train.AdamOptimizer()
    training_op = optimizer.minimize(loss)

    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

    init = tf.global_variables_initializer()
saver = tf.train.Saver()

hm_epochs = 100
config = tf.ConfigProto(allow_soft_placement=True)
config.gpu_options.allow_growth = True

the batch size is 128

with tf.Session(config=config) as sess:
        tbWriter = tf.summary.FileWriter(logPath, sess.graph)
        dataset =, training_labels))
        dataset =
        dataset = dataset.batch(batch_size)

        testset =, test_labels))
        testset =
        testset = testset.batch(len(test_images))

        iterator = dataset.make_initializable_iterator()
        test_iterator = testset.make_initializable_iterator()
        next_element = iterator.get_next()
        for epoch in range(hm_epochs):
            epoch_loss = 0
            while True:
                    epoch_x, epoch_y =
                    # _, c =[optimizer, cost], feed_dict={x: epoch_x, y: epoch_y})
                    # epoch_loss += c
          , feed_dict={X:epoch_x, y:epoch_y, training:True})
                except tf.errors.OutOfRangeError:

            # acc_train = accuracy.eval(feed_dict={X:epoch_x, y:epoch_y})
                next_test = test_iterator.get_next()
                test_images, test_labels =
                acc_test = accuracy.eval(feed_dict={X:test_images, y:test_labels})
                print("Epoch {0}: Train accuracy {1}".format(epoch, acc_test))
            except tf.errors.OutOfRangeError:
            # print("Epoch {0}: Train accuracy {1}, Test accuracy: {2}".format(epoch, acc_train, acc_test))
        save_path =, "./my_first_model")

I have 9k training pictures and 3k pictures for testing

Upvotes: 1

Views: 1486

Answers (2)


Reputation: 26

You could try the following code to see if tensorflow is recognizing your GPU:

from tensorflow.python.client import device_lib

Upvotes: 0


Reputation: 3476

There are a few issues in your code that may result in low GPU usage.

1) Add a prefetch instruction at the end of your Dataset pipeline to enable the CPU to maintain a buffer of input data batches ready to move them to the GPU.

# this should be the last thing in your pipeline
dataset = dataset.prefetch(1)

2) You are using feed_dict to feed your model, along with Dataset iterators. This is not the intended way! feed_dict is the slowest method of inputting data to your model and not recommended. You should define your model in terms of the next_element outputs of the iterators.


next_x, next_y = iterator.get_next()
with tf.device('/GPU:0'):
    conv1 = tf.layers.conv2d(next_x, filters=32, kernel_size=3,
                        strides=1, padding="SAME",
                        activation=tf.nn.relu, name="conv1")
    # rest of model here...
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, 

Then you can call your training operation without using feed_dict, and the iterator will handle feeding data to your model behind the scenes. Here is another related Q&A. Yout new training loop would look something like this:

while True:
    try:, feed_dict={training:True})
    except tf.errors.OutOfRangeError:

You should only input data via feed_dict that your iterator does not provide, and these should typically be very lightweight.

For further tips on performance, you can refer to this guide on TF website.

Upvotes: 1

Related Questions