Mirza Munib Baig
Mirza Munib Baig

Reputation: 311

How do i apply Data Augmentation on entire data-set

enter image description here

Hello Guys, I am new to Machine Learning and trying to learn it. I tried data augmentation on my data-set. i got this code from keras website but this code is just picking 1 image at a time. I want this code to pick image one by one from the data-set and apply augmentation techniques on it. I am confused on what to change in it. I will be very thankful if someone helps me.

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img

datagen = ImageDataGenerator(

        rotation_range=40,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True,
        fill_mode='nearest')

img = load_img('data/train/cats/cat.0.jpg')  # this is a PIL image

x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)

x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)

i = 0

for batch in datagen.flow(x, batch_size=1,
                          save_to_dir='preview', save_prefix='cat', save_format='jpeg'):

    i += 1

    if i > 20:

        break  # otherwise the generator would loop indefinitely

Upvotes: 0

Views: 2461

Answers (1)

Nafiz Ahmed
Nafiz Ahmed

Reputation: 567

I am assuming you have the dataset in your data/train/cats/ folder. Now, given the above code, it reads a single image, runs augmentation on that single image, and produces 20 different images.

Now, to extend the process, you can simply use the os or glob module to get the list of files from the directory. And then loop over the block of your code. For example:

import glob

list_of_files = glob.glob('data/train/cats/*.jpg')
for file in list_of_files:
  img = load_img(file)  # this is a PIL image
  x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
  .
  .
  .

Rather than looping over the whole block of your code, you can utilize the datagen.flow more, i.e. instead of passing a single image as x , you can pass the whole dataset. For example, if n is the total number of images, your x shape will look like (n,3,150,150) assuming all images are the same size.

Also, you can vary this n value. i.e. not choosing the total dataset length. In that case, say n value is 20, at first iteration, you will read first 20 images and pass x like (20,3,150,150). Then at second iteration, you read the next 20 and so on.

For example,

import glob
import numpy as np

x = []
list_of_files = glob.glob('data/train/cats/*.jpg')
for file in list_of_files:
  img = load_img(file)  # this is a PIL image
  x.append(img_to_array(img))
x = np.array(x) # feed this x to your datagen.flow

# print(x.shape)
# (n, 3, 150, 150)

# (Note: n is the length of list_of_files, i.e. total dataset length)

Upvotes: 1

Related Questions