Arun
Arun

Reputation: 2478

TensorFlow 2.0 Data Augmentation: tf.keras.preprocessing.image.ImageDataGenerator flow() method

I am trying to perform data augmentation using TensorFlow 2.2.0 and Python 3.7 for LeNet-300-100 Dense neural network for MNIST dataset. The code I have is as follows:

batch_size = 60
num_classes = 10
num_epochs = 100


# Data preprocessing and cleadning:
# input image dimensions
img_rows, img_cols = 28, 28

# Load MNIST dataset-
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()


if tf.keras.backend.image_data_format() == 'channels_first':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

print("\n'input_shape' which will be used = {0}\n".format(input_shape))
# 'input_shape' which will be used = (28, 28, 1)


# Convert datasets to floating point types-
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# Normalize the training and testing datasets-
X_train /= 255.0
X_test /= 255.0

# convert class vectors/target to binary class matrices or one-hot encoded values-
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)


X_train.shape, y_train.shape
# ((60000, 28, 28, 1), (60000, 10))

X_test.shape, y_test.shape
# ((10000, 28, 28, 1), (10000, 10))


# Example of using 'tf.keras.preprocessing.image.ImageDataGenerator class's - flow(x, y)':

datagen = ImageDataGenerator(
    # featurewise_center=True,
    # featurewise_std_normalization=True,
    rotation_range = 20,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    horizontal_flip = True
    )

Now, when I see the number of batches produced by 'datagen.flow()' with the code:

# Sanity check-
i = 0

for x, y in datagen.flow(X_train, y_train, batch_size = batch_size, shuffle = True):
    # print("\ntype(x) = {0}, type(y) = {1}".format(type(x), type(y)))
    # print("x.shape = {0}, y.shape = {1}\n".format(x.shape, y.shape))
    print(i, end = ', ')
    i += 1

The value of i keeps increasing without terminating. Of course something is going wrong. According to what I know, the number of batches = number of training examples / batch size. Therefore, in this example, the number of batches = 60000 / 60 = 1000.

Then why is it producing so many batches of augmented data? And how can I stop it? What's going wrong?

Thanks!

Upvotes: 0

Views: 784

Answers (1)

t.okuda
t.okuda

Reputation: 41

In default, ImageDataGenerator generates images infinitely. you can break inside the for loop like referenced here: How to find how many Image Generated By ImageDataGenerator

Or, you can specify steps_per_epoch parameter to fit() function while learning.

But, tensorflow>=2.0 is not support multiprocessing so ImageDataGenerator might be bottleneck of learning.

Upvotes: 1

Related Questions