Reputation: 1230
According to the documentation of tf.dataset.shuffle, it will fill in a buffer with size k
then shuffle inside of it. Tho I don't want the order of data to be changed, I want it to be buffered. Then I found there is tf.dataset.prefetch, which says "This allows later elements to be prepared while the current element is being processed."
From the description I guess prefetch
is what I want (i.e. pre-loading the data while the pervious data are being used in training), but while trying to look into the code of tf.dataset.shuffle
to see if they actually call tf.dataset.prefetch
, I got stuck in these lines (paste them below), cannot find where is shuffle_dataset_v3
defined.
variant_tensor = gen_dataset_ops.shuffle_dataset_v3(
input_dataset._variant_tensor, # pylint: disable=protected-access
buffer_size=self._buffer_size,
seed=self._seed,
seed2=self._seed2,
seed_generator=gen_dataset_ops.dummy_seed_generator(),
reshuffle_each_iteration=self._reshuffle_each_iteration,
**self._flat_structure)
My major question is whether prefetch
is the replacement of shuffle
in terms of buffering the data, and it would also be nice if someone can point me to where shuffle_dataset_v3
was implemented?
Upvotes: 0
Views: 414
Reputation: 106
Yes. Prefetch
is for buffering data.
gen_dataset_ops
, and other gen_xxx_ops
are not included in source code because it is automatically generated by bazel to wrap C++ implementation for use in python. You should be able to find these gen_xxx_ops
code in your local installation. For example, ${PYTHON_ROOT}/site-packages/tensorflow/python/ops/gen_dataset_ops.py
Upvotes: 1