psj
psj

Reputation: 376

Use preprocessing function that changes size of input on ImageDataGenerator

I wish to take the FFT of the input dataset loaded using ImageDataGenerator. Taking the FFT will double the number of channels as I stack the real and complex parts of the complex output of the FFT together along the channels dimension. The preprocessing_function attribute of the ImageDataGenerator class should output a Numpy tensor with the same shape as the input, so I could not use that. I tried applying tf.math.fft2d directly on the ImageDataGenerator.flow_from_directory() output, but it is consuming too much RAM - causing the program to crash on Google colab. Another way I tried was to add a custom layer computing the FFT as the first layer of my neural network, but this adds to the training time. So I wish to do it as a pre-processing step. Could anyone kindly suggest an efficient way to apply a function on ImageDataGenerator.

Upvotes: 0

Views: 331

Answers (1)

Nicolas Gervais
Nicolas Gervais

Reputation: 36684

You can do a custom ImageDataGenerator, but I have no reason to think this is any faster than using it in the first layer. It seems like a costly operation, since tf.signal.fft2d takes complex64 or complex128 dtypes. So it needs casting, and then casting back because neural network weights are tf.float32 and other image processing functions don't take complex dtype.

import tensorflow as tf

labels = ['Cats', 'Dogs', 'Others']

def read_image(file_name):
  image = tf.io.read_file(file_name)
  image = tf.image.decode_jpeg(image, channels=3)
  image = tf.image.convert_image_dtype(image, tf.float32)
  image = tf.image.resize_with_pad(image, target_height=224, target_width=224)
  image = tf.cast(image, tf.complex64)
  image = tf.signal.fft2d(image)
  label = tf.strings.split(file_name, '\\')[-2]
  label = tf.where(tf.equal(label, labels))
  return image, label

ds = tf.data.Dataset.list_files(r'path\to\my\pictures\*\*.jpg')

ds = ds.map(read_image)

next(iter(ds))

Upvotes: 1

Related Questions