jstaker7
jstaker7

Reputation: 1236

Understanding tensorflow queues and cpu <-> gpu transfer

After reading this github issue I feel like I'm missing something in my understanding on queues:

https://github.com/tensorflow/tensorflow/issues/3009

I thought that when loading data into a queue, it will get pre-transferred to the GPU while the last batch is getting computed, so that there is virtually no bandwidth bottleneck, assuming computation takes longer than the time to load the next batch.

But the above link suggests that there is an expensive copy from queue into the graph (numpy <-> TF) and that it would be faster to load the files into the graph and do preprocessing there instead. But that doesn't make sense to me. Why does it matter if I load a 256x256 image from file vs a raw numpy array? If anything, I would think that the numpy version is faster. What am I missing?

Upvotes: 4

Views: 3000

Answers (2)

Simon
Simon

Reputation: 553

The documentation suggests that it is possible to pin a queue to a device:

N.B. Queue methods (such as q.enqueue(...)) must run on the same device as the queue. Incompatible device placement directives will be ignored when creating these operations.

But the above implies to me that any variables one is attempting to enqueue should already be on the GPU.

This comment suggests it may be possible to use tf.identity to perform the prefetch.

Upvotes: 2

Yaroslav Bulatov
Yaroslav Bulatov

Reputation: 57953

There's no implementation of GPU queue, so it only loads stuff into main memory and there's no asynchronous prefetching into GPU. You could make something like a GPU-based queue using variables pinned to gpu:0

Upvotes: 4

Related Questions