Reputation: 2478
I am trying to perform data augmentation using TensorFlow 2.2.0 and Python 3.7 for LeNet-300-100 Dense neural network for MNIST dataset. The code I have is as follows:
batch_size = 60
num_classes = 10
num_epochs = 100
# Data preprocessing and cleadning:
# input image dimensions
img_rows, img_cols = 28, 28
# Load MNIST dataset-
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
if tf.keras.backend.image_data_format() == 'channels_first':
X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
print("\n'input_shape' which will be used = {0}\n".format(input_shape))
# 'input_shape' which will be used = (28, 28, 1)
# Convert datasets to floating point types-
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# Normalize the training and testing datasets-
X_train /= 255.0
X_test /= 255.0
# convert class vectors/target to binary class matrices or one-hot encoded values-
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
X_train.shape, y_train.shape
# ((60000, 28, 28, 1), (60000, 10))
X_test.shape, y_test.shape
# ((10000, 28, 28, 1), (10000, 10))
# Example of using 'tf.keras.preprocessing.image.ImageDataGenerator class's - flow(x, y)':
datagen = ImageDataGenerator(
# featurewise_center=True,
# featurewise_std_normalization=True,
rotation_range = 20,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = True
)
Now, when I see the number of batches produced by 'datagen.flow()' with the code:
# Sanity check-
i = 0
for x, y in datagen.flow(X_train, y_train, batch_size = batch_size, shuffle = True):
# print("\ntype(x) = {0}, type(y) = {1}".format(type(x), type(y)))
# print("x.shape = {0}, y.shape = {1}\n".format(x.shape, y.shape))
print(i, end = ', ')
i += 1
The value of i keeps increasing without terminating. Of course something is going wrong. According to what I know, the number of batches = number of training examples / batch size. Therefore, in this example, the number of batches = 60000 / 60 = 1000.
Then why is it producing so many batches of augmented data? And how can I stop it? What's going wrong?
Thanks!
Upvotes: 0
Views: 784
Reputation: 41
In default, ImageDataGenerator generates images infinitely. you can break inside the for loop like referenced here: How to find how many Image Generated By ImageDataGenerator
Or, you can specify steps_per_epoch parameter to fit() function while learning.
But, tensorflow>=2.0 is not support multiprocessing so ImageDataGenerator might be bottleneck of learning.
Upvotes: 1