Reputation: 791
I'm trying to do image classification with the Inception V3 model. Does ImageDataGenerator
from Keras create new images which are added onto my dataset? If I have 1000 images, will using this function double it to 2000 images which are used for training? Is there a way to know how many images were created and now fed into the model?
Upvotes: 48
Views: 27275
Reputation: 19
Let me try and tell u in the easiest way possible with the help of an example.
For example:
ImageDataGenerator
to the dataset with batch_size = 25
steps_per_epoch=total_samples/batch_size
steps_per_epoch
will be equal to 20
ImageDataGenerator
) in each epochUpvotes: 1
Reputation: 63
ImageDataGenerator class ensures that the model receives new variations of the images at each epoch. But it only returns the transformed images and does not add it to the original corpus of images. If it was, in fact, the case, then the model would be seeing the original images multiple times which would definitely overfit our model.
Upvotes: 1
Reputation: 33410
Short answer: 1) All the original images are just transformed (i.e. rotation, zooming, etc.) every epoch and then used for training, and 2) [Therefore] the number of images in each epoch is equal to the number of original images you have.
Long answer: In each epoch, the ImageDataGenerator
applies a transformation on the images you have and use the transformed images for training. The set of transformations includes rotation, zooming, etc. By doing this you're somehow creating new data (i.e. also called data augmentation), but obviously the generated images are not totally different from the original ones. This way the learned model may be more robust and accurate as it is trained on different variations of the same image.
You need to set the steps_per_epoch
argument of fit
method to n_samples / batch_size
, where n_samples
is the total number of training data you have (i.e. 1000 in your case). This way in each epoch, each training sample is augmented only one time and therefore 1000 transformed images will be generated in each epoch.
Further, I think it's worth clarifying the meaning of "augmentation" in this context: basically we are augmenting the images when we use ImageDataGenerator
and enabling its augmentation capabilities. But the word "augmentation" here does not mean, say, if we have 100 original training images we end up having 1000 images per epoch after augmentation (i.e. the number of training images does not increase per epoch). Instead, it means we use a different transformation of each image in each epoch; hence, if we train our model for, say, 5 epochs, we have used 5 different versions of each original image in training (or 100 * 5 = 500 different images in the whole training, instead of using just the 100 original images in the whole training). To put it differently, the total number of unique images increases in the whole training from start to finish, and not per epoch.
Upvotes: 75
Reputation: 8527
As is officially written here ImageDataGenerator
is a batches Generator of tensor image data with real-time data augmentation. The data will be looped over (in batches). This means that will on the fly apply transformations to batch of images randomly. For instance:
train_datagen = ImageDataGenerator(
rescale=1./255, #scale images from integers 0-255 to floats 0-1.
shear_range=0.2,
zoom_range=0.2, # zoom in or out in images
horizontal_flip=True) #horizontal flip of images
At every new epoch new random transformations will be applied and in this way we train with a little different set of images each time. Obtaining more data is not always achievable or possible, using ImageDataGenerator
is helpful this way.
Upvotes: 8
Reputation: 1337
Here is my attempt to answer as I also had this question on my mind.
ImageDataGenerator
will NOT add new images to your data set in a sense that it will not make your epochs bigger. Instead, in each epoch it will provide slightly altered images (depending on your configuration). It will always generate new images, no matter how many epochs you have.
So in each epoch model will train on different images, but not too different. This should prevent overfitting and in some way simulates online learning.
All these alterations happen in memory, but if you want to see these images you can save them to disc, inspect them, see how many of them were generated and get the sense of how ImageDataGenerator
works. To do this pass save_to_dir=/tmp/img-data-gen-outputs
to function flow_from_directory
. See docs.
Upvotes: 32
Reputation: 1416
It all depends on how many epochs you run, as @today answered, fitting the model with the generator will make the generator provide as many images as needed, depending on steps_per_epoch
.
To make things easier to understand, put i.e. 20 images into two whatever folders (mimicking classified data), create a generator out of the parent folder and run a simple for loop
count = 0
for image, label in my_test_generator:
count += 1
print(count)
The first thing you should confirm that you see the message Found 20 images belonging to 2 classes.
, and the loop itself will NOT stop after 20 iterations, but it will just keep incrementing and printing endlessly (I got mine to 10k and stopped it manually). The generator will provide as many images as are requested, whether they were augmented or not.
Upvotes: 4
Reputation: 7206
Also note that: These augmented images are not stored in the memory, they are generated on the fly while training and lost after training. You can't read again those augmented images.
Not storing those images is a good idea because we'd run out of memory very soon storing huge no of images
Upvotes: 3