Paul Bendevis
Paul Bendevis

Reputation: 2621

Tensorflow GPU/CPU Performance Suddenly Input Bound

I have a CPU+GPU instance that I'm using to train tf models. My data is on a SSD. I have used tf's Dataset API, with interleaving, mapping and no pyfunc in order for it to run efficiently without being i/o bound. It was working well with <1% time spent waiting on input data but I can't track down the changes that caused the program to become i/o bound. A quick summary of the code is that it loads npy files using tf.data.FixedLengthRecordDataset , stacks them and batches them. Any hints you can see from the profile? It looks sparse with a lot of interuptions as if parallelism isn't working properly.

ds = dataset.interleave(
            numpy_file_parser, tf.data.experimental.AUTOTUNE
        )
        
ds_train = (ds
            .repeat()
            .shuffle(1000, reshuffle_each_iteration=True)
            .batch(batch_size)
            .prefetch(tf.data.experimental.AUTOTUNE)
            
           )

Inefficient attempt: Inefficient Attempt

Here is the profile without i/o bound. Efficient Attempt

Upvotes: 1

Views: 170

Answers (1)

Paul Bendevis
Paul Bendevis

Reputation: 2621

Turns out it was caused by the TF 2.3.0. I'm using a 6.1 GPU which is not fully supported in TF 2.3. In the release notes

  • GPU TF 2.3 includes PTX kernels only for compute capability 7.0 to reduce the TF pip binary size. Earlier releases included PTX for a variety of older compute capabilities.

Reverting to TF 2.2 fixes the problem.

Upvotes: 1

Related Questions