Reputation: 21
I'm trying to augment the ISIC 2019 dataset images with 9 classes. The 'NV' class is overrepresented (12876 of a total of 25331 images) so I'd like to exclude it from the augmentation process but later on recombine the augmented images and the unchanged 'NV' images.
I'd like to have a have a ImageDataGenerator object like this with a training / validation split so I can use it as an "on-the-fly"-augmentation.
It wasn't possible for me to combine two ImageDataGenerators - as proclaimed on the Internet (here).
I tried the following code but can't figure out how to write an "on-the-fly" data generator. It doesn't work for me as it doesn't even save any images. Also converting the images and saving them for further use will probably take too much time (I'm using Google Drive with only 15GB storage).
from keras.preprocessing.image import ImageDataGenerator
import os
import shutil
# Definieren data folders
data_directory = "/content/dataset" # Path to the ISIC 2019 images
target_directory = "/content/dataset_aug" # Path to the augmented images already sorted in subfolders
# Create dataset_aug folder
os.makedirs(target_directory, exist_ok=True)
# Class names
class_names = ["AK", "BCC", "BKL", "DF", "MEL", "NV", "SCC", "UNK", "VASC"]
# Augmentation parameters
datagen_aug = ImageDataGenerator(
rescale=1/255.,
rotation_range=180,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
vertical_flip=True,
fill_mode='nearest'
)
# Iterate over all classes and do an augmentation
for class_name in class_names:
class_directory = os.path.join(data_directory, class_name)
target_class_directory = os.path.join(target_directory, class_name)
# Create target directory
if not os.path.exists(target_class_directory):
os.makedirs(target_class_directory)
if class_name == "NV":
# Copy all NV images unchanged to the target directory
image_files = os.listdir(class_directory)
for image_file in image_files:
source_path = os.path.join(class_directory, image_file)
target_path = os.path.join(target_class_directory, image_file)
shutil.copy(source_path, target_path)
else:
# Do the augmentation for the other classes and save them in their target directories
image_generator = datagen_aug.flow_from_directory(
class_directory,
target_size=(224, 224),
batch_size=32,
class_mode=None,
save_to_dir=target_class_directory,
save_prefix='aug_',
save_format='png'
)
num_augmented_images = 9200 # Number of augmented images per class
for i in range(num_augmented_images):
batch = next(image_generator)
if (i + 1) % 100 == 0:
print(f"Generated {i+1} augmented images for class {class_name}")
print("Data augmentation completed.")
Upvotes: 0
Views: 155
Reputation: 21
There are a few ways you can approach this:
Use two separate ImageDataGenerator instances - one for augmenting the other classes, and one without augmentation for the "NV" class. Then concatenate or merge the outputs when loading the data.
Subclass ImageDataGenerator to customize the augmentation logic. In the flow and flow_from_directory methods, you can check the class name and apply different augmentation depending on the class.
Manually apply augmentation on the "NV" images first to create more samples. Then combine the augmented "NV" images with the originals and pass the full dataset through ImageDataGenerator.
Here is an example of approach 1:
# ImageDataGenerator for augmenting images
aug_datagen = ImageDataGenerator(...)
# ImageDataGenerator without augmentation
no_aug_datagen = ImageDataGenerator()
# Generate augmented data for non-NV classes
aug_generator = aug_datagen.flow_from_directory(...)
# Generate unaugmented data for NV class
no_aug_generator = no_aug_datagen.flow_from_directory(...)
# Concatenate the generators
concat_generator = ConcatinateImageGenerators([aug_generator, no_aug_generator])
# Fit model using the concatenated generator
model.fit(concat_generator)
The key idea is to generate the augmented and non-augmented images separately, and then concatenate them when loading the data.
2nd approach:
This approach augmenting the dataset while excluding the 'NV' class:
# Augment all classes except 'NV'
datagen = ImageDataGenerator(...)
for class_name in class_names:
if class_name != 'NV':
# Augment this class
datagen.flow(...)
# Load original 'NV' images
nv_images = load_images('NV')
# Concatenate augmented images from other classes with NV images
all_images = np.concatenate([augmented_images, nv_images])
datagen = ImageDataGenerator(...)
for batch in datagen.flow_from_directory(data_dir):
if batch[0].shape[0] == 'NV':
# Pass through NV images unchanged
continue
# Otherwise augment batch
augmented_batch = datagen.random_transform(batch)
# Save augmented images
nv_datagen = ImageDataGenerator() # Identity transform
augment_datagen = ImageDataGenerator(...)
# Augment NV class
nv_flow = nv_datagen.flow_from_directory('NV')
# Augment other classes
augmented_flow = augment_datagen.flow_from_directory(data_dir)
# Concatenate generator outputs
concat_gen = concat_gen([nv_flow, augmented_flow])
Upvotes: 1