Reputation: 2621
I have a CPU+GPU instance that I'm using to train tf models. My data is on a SSD. I have used tf's Dataset API, with interleaving, mapping and no pyfunc in order for it to run efficiently without being i/o bound. It was working well with <1% time spent waiting on input data but I can't track down the changes that caused the program to become i/o bound. A quick summary of the code is that it loads npy files using tf.data.FixedLengthRecordDataset , stacks them and batches them. Any hints you can see from the profile? It looks sparse with a lot of interuptions as if parallelism isn't working properly.
ds = dataset.interleave(
numpy_file_parser, tf.data.experimental.AUTOTUNE
)
ds_train = (ds
.repeat()
.shuffle(1000, reshuffle_each_iteration=True)
.batch(batch_size)
.prefetch(tf.data.experimental.AUTOTUNE)
)
Here is the profile without i/o bound.
Upvotes: 1
Views: 170
Reputation: 2621
Turns out it was caused by the TF 2.3.0. I'm using a 6.1 GPU which is not fully supported in TF 2.3. In the release notes
Reverting to TF 2.2 fixes the problem.
Upvotes: 1