Reputation: 419
I am implementing a model inspired by the NMT model. I am using a training set stored as TFRecords files, using a TFRecordDataset to fetch it and feed the model. Following Google's recommendations about input pipeline performances improvement, I have:
num_parallel_calls
and prefetch
on the Dataset map
operations.However, GPU remains at maximum 40%, and it is barely as slow as when run on CPU. I am thus wondering about the prefetch
operation.
If I understand correctly, it will create a special thread that buffers N examples. But what does it mean ? What happens to the other examples not buffered ?
is there an optimal relation between the prefetch buffer size, the number of examples in the complete Dataset and the batch size ? In the NMT code, prefetch buffer size is set at 1000*batch_size
, but why ? If e.g. I am using 10000 examples, a batch size of 100, what should be the prefetch buffer size ?
Any other advice regarding Dataset speedup would be appreciated.
Upvotes: 1
Views: 398
Reputation: 419
Apparently, Dataset API runs on CPU and not on GPU, so this answers the question.
Upvotes: 0