Tensorflow Datasets: Make batches with different shaped data

import tensorflow_datasets as tfds
import tensorflow as tf

def input_fn():

    dataset_builder = tfds.builder("oxford_flowers102")
    dataset_builder.download_and_prepare()

    ds = dataset_builder.as_dataset(split=tfds.Split.TRAIN)
    ds = ds.repeat()
    ds = ds.batch(32)
    return ds

will result in

InvalidArgumentError: Cannot batch tensors with different shapes in component 1. 
First element had shape [500,666,3] and element 1 had shape [752,500,3]. 
[Op:IteratorGetNextSync]

This can be solved by using a resize/pad function that returns images of same shape as shown here and here

ds = ds.map(resize_or_pad_function)
ds = ds.batch(...)

However, I do not want to resize or pad my images, as I want to retain the original size and aspect of images. It is for training a convolutional neural network that can accept varied images sizes.

What do I do if I need batches of tensors with shape (32, None, None, 3) where each image has a different height and width?

Upvotes: 0

Views: 1782

Answers (1)

In GPU, calculations are accelerated by passing tensors (of same shape) into the graph, where arithmetic and logic operations are done on the entire tensor as they pass through the graph (as opposed to doing them one by one). AFAIK that's what is supported by tensorflow for now (even with Eager Support).

These tensors shall have zeros and the calculations can be done, which is what we would do by padding. The network will have to eventually learn to ignore the activations caused by black regions as they do not vary with the label. So, one has to do one of the following:

  1. Crop the images
  2. Pad the images
  3. Combination of both

any of which must return a tensor with a definitive shape for any given batch.

Nonetheless, it is still possible to have a neural network that accepts any image during predict/evaluation phase as this will not be in batches (when run in real-time).

During training, to avoid

  1. information loss: e.g. resizing a 2048 x 2048 image to a 28 x 28 image
  2. excessive padding: e.g. padding a 28 x 28 image with zeros to make it 2048 x 2048

it's best to pool images of nearly the same size together and split the dataset into batches of different sizes (while still having same sized images inside any batch), as @xenotecc mentioned in the comments.

This is also true in other frameworks that make use of the GPU to accelerate computation as of now. Feel free to add answers if it is possible / as it becomes possible.

Linked:

Upvotes: 1

Related Questions