Tensorflow Dataset performances?

Question

I am implementing a model inspired by the NMT model. I am using a training set stored as TFRecords files, using a TFRecordDataset to fetch it and feed the model. Following Google's recommendations about input pipeline performances improvement, I have:

preprocessed as much as possible beforehand on CPU
stacked several training examples up to about 100 MB TFrecords files (having less files containing more examples)
used num_parallel_calls and prefetch on the Dataset map operations.

However, GPU remains at maximum 40%, and it is barely as slow as when run on CPU. I am thus wondering about the prefetch operation.

If I understand correctly, it will create a special thread that buffers N examples. But what does it mean ? What happens to the other examples not buffered ?
is there an optimal relation between the prefetch buffer size, the number of examples in the complete Dataset and the batch size ? In the NMT code, prefetch buffer size is set at 1000*batch_size, but why ? If e.g. I am using 10000 examples, a batch size of 100, what should be the prefetch buffer size ?

Any other advice regarding Dataset speedup would be appreciated.

guik · Accepted Answer

Apparently, Dataset API runs on CPU and not on GPU, so this answers the question.

Tensorflow Dataset performances?

Answers (1)

Related Questions