john-mueller
john-mueller

Reputation: 117

How can the Tensorflow2 data pipeline be optimized?

I work with a large image dataset which I transform to tfrecords in the first step and load to a tf.data.dataset in the next.

But the dataset is so big that I can't get a bigger batch size than 10, despite the 12 GB GPU. Now the question arises, how I can optimize the loading of the images so that I can reach a bigger batch_size.

Is there a way to use maybe .fit_generator() to optimize this process?

Here is my current loading process of the training data (the validation data is transformed in the same way and is therefore not shown here as well):

train_dataset = dataset.load_tfrecord_dataset(dataset_path, class_names_path, image_size)
train_dataset = train_dataset.shuffle(buffer_size=shuffle_buffer)
train_dataset = train_dataset.batch(batch_size)
train_dataset = train_dataset.map(lambda x, y: (
        dataset.transform_images(x, image_size),
        dataset.transform_targets(y, anchors, anchor_masks, image_size)))
train_dataset = train_dataset.prefetch(batch_size)

Start of my Trainingsphase:

history = model.fit(train_dataset,
                            epochs=epochs,
                            callbacks=callbacks,
                            validation_data=val_dataset)

Upvotes: 0

Views: 52

Answers (1)

Timbus Calin
Timbus Calin

Reputation: 14983

Unfortunately there are some constraints that depend on the hardware architecture, regardless of how much we may optimize from a software perspective.

In your case, the only way in which the batch size could be increased would be to lower the dimension of the images; otherwise you will not be able to increase the batch size.

tf.data.Dataset() 

is an excellent library for manipulating data, and using the correct/necessary preprocessing steps like prefetch can indeed make you processing faster.

Nevertheless, due to the hardware constraints you cannot increase the batch size. Either decrease the image sizes in order to be able to increase the batch size or you need to opt for a bigger GPU >=16 GB VRAM.

Upvotes: 1

Related Questions