Tong Shen
Tong Shen

Reputation: 835

reading a large dataset in tensorflow

I am not quite sure about how file-queue works. I am trying to use a large dataset like imagenet as input. So preloading data is not the case, so I am wondering how to use the file-queue. According to the tutorial, we can convert data to TFRecords file as input. Now we have a single big TFRecords file. So when we specify a FIFO queue for the reader, does it mean the program would fetch a batch of data each time and feed the graph instead of loading the whole file of data?

Upvotes: 3

Views: 2714

Answers (1)

Yaroslav Bulatov
Yaroslav Bulatov

Reputation: 57893

The amount of pre-fetching depends on your queue capacity. If you use string_input_producer for your filenames and batch for batching, you will have 2 queues - filename queue, and prefetching queue created by batch. Queue created by batch has default capacity of 32, controlled by batch(...,capacity=) argument, therefore it can prefetch up to 32 images. If you follow outline in TensorFlow official howto's, processing examples (everything after batch) will happen in main Python thread, whereas filling up the queue will happen in threads created/started by batch/start_queue_runners, so prefetching new data and running prefetched data through the network will occur concurrently, blocking when the queue gets full or empty.

Upvotes: 2

Related Questions