Reputation: 1236
After reading this github issue I feel like I'm missing something in my understanding on queues:
https://github.com/tensorflow/tensorflow/issues/3009
I thought that when loading data into a queue, it will get pre-transferred to the GPU while the last batch is getting computed, so that there is virtually no bandwidth bottleneck, assuming computation takes longer than the time to load the next batch.
But the above link suggests that there is an expensive copy from queue into the graph (numpy <-> TF) and that it would be faster to load the files into the graph and do preprocessing there instead. But that doesn't make sense to me. Why does it matter if I load a 256x256 image from file vs a raw numpy array? If anything, I would think that the numpy version is faster. What am I missing?
Upvotes: 4
Views: 3000
Reputation: 553
The documentation suggests that it is possible to pin a queue to a device:
N.B. Queue methods (such as q.enqueue(...)) must run on the same device as the queue. Incompatible device placement directives will be ignored when creating these operations.
But the above implies to me that any variables one is attempting to enqueue should already be on the GPU.
This comment suggests it may be possible to use tf.identity
to perform the prefetch.
Upvotes: 2
Reputation: 57953
There's no implementation of GPU queue, so it only loads stuff into main memory and there's no asynchronous prefetching into GPU. You could make something like a GPU-based queue using variables pinned to gpu:0
Upvotes: 4