Molly Zhang
Molly Zhang

Reputation: 81

GPU under utilization using tensorflow dataset

TF Trace

During training of my data, my GPU utilization is around 40%, and I clearly see that there is a datacopy operation that's using a lot of time, based on tensorflow profiler(see attached picture). I presume that "MEMCPYHtoD" option is copying the batch from CPU to GPU, and is blocking the GPU from being used. Is there anyway to prefetch data to GPU? or is there other problems that I am not seeing?

Here is the code for dataset:

X_placeholder = tf.placeholder(tf.float32, data.train.X.shape)
y_placeholder = tf.placeholder(tf.float32, data.train.y[label].shape)

dataset = tf.data.Dataset.from_tensor_slices({"X": X_placeholder, 
                                              "y": y_placeholder})
dataset = dataset.repeat(1000)
dataset = dataset.batch(1000)
dataset = dataset.prefetch(2)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()

Upvotes: 8

Views: 2173

Answers (2)

Alaroff
Alaroff

Reputation: 2298

Prefetching to a single GPU:

  • Consider using a more flexible approach than prefetch_to_device, e.g. by explicitly copying to the GPU with tf.data.experimental.copy_to_device(...) and then prefetching. This allows to avoid the restriction that prefetch_to_device must be the last transformation in a pipeline, and allow to incorporate further tricks to optimize the Dataset pipeline performance (e.g. by overriding threadpool distribution).
  • Try out the experimental tf.contrib.data.AUTOTUNE option for prefetching, which allows the tf.data runtime to automatically tune the prefetch buffer sizes based on your system and environment.

At the end, you might end up doing something like this:

dataset = dataset.apply(tf.data.experimental.copy_to_device("/gpu:0"))
dataset = dataset.prefetch(tf.contrib.data.AUTOTUNE)

Upvotes: 5

Luke
Luke

Reputation: 7089

I believe you can now fix this problem by using prefetch_to_device. Instead of the line:

dataset = dataset.prefetch(2)

do

dataset = dataset.apply(tf.contrib.data.prefetch_to_device('/gpu:0', buffer_size=2))

Upvotes: 2

Related Questions