Validation set only gets images from one class when using keras ImageDataGenerator flow_from_dataframe

Question

I have a list of images along with the class it belongs to in this format:

list.txt

image1 good
image2 good
image3 good
.
.
.
image4 bad
image5 bad
image6 bad

I used the ImageDataGenerator to split validation data:

train_datagen = ImageDataGenerator(rescale=1./255, validation_split = 0.25)

I used pandas to read from file make dataframe:

load_images = pd.read_csv("list.txt", delim_whitespace = True, header = None)
load_images.columns = ['filename','class']
load_images.columns = load_images.columns.str.strip()

trainDataframe = load_images

I used flow_from_dataframe to create train and validation generators:

train_generator = train_datagen.flow_from_dataframe(
        trainDataFrame,
        x_col = 'filename',
        y_col = 'class',
        directory = path_to_parent_folder_of_images,
        target_size=(inputHeight, inputWidth),
        batch_size=batch_size,
        class_mode='categorical',
        subset = 'training',
        save_to_dir = "path_to_folder\training",
        shuffle = True)

validation_generator = train_datagen.flow_from_dataframe(
        trainDataFrame,
        x_col = 'filename',
        y_col = 'class',
        directory = path_to_parent_folder_of_images,
        target_size=(inputHeight, inputWidth),
        batch_size=batch_size,
        class_mode='categorical',
        subset= 'validation',
        save_to_dir = "path_to_folder\validation",
        shuffle = True)

Finally I train the model:

model.fit_generator(
    train_generator,
    steps_per_epoch = train_generator.n // train_generator.batch_size,
    epochs = epochs,
    validation_data = validation_generator,
    validation_steps = validation_generator.n // validation_generator.batch_size,
    callbacks = callback_list)

The problem is the validation set only contains images from class bad. There are no images of the other class. I have used save images to directory parameter and I only see images from one class. The training generator seems fine(has images of both good and bad). My validation accuracy is always 0 or 1 because of this error. I have seen examples online and tried to follow them. Nobody seems to face this problem so I am not sure what I am doing incorrectly.

I am using these versions: python - 3.7.4

tensorflow - 2.0.0

keras - 2.3.1

shreeya · Accepted Answer

I realized that the flow_from_dataframe() takes the first 25% images from the list instead of choosing randomly. Since my list is sorted, meaning all good classes are together and bad together, it was taking the first 25% of the images and sending it to the validation set and since the list is sorted it always put good images in the val_set. I used

from sklearn.utils import shuffle dataframes = shuffle(dataframes)

to shuffle and send it to the flow_from_dataframe() and that solved the problem.

Validation set only gets images from one class when using keras ImageDataGenerator flow_from_dataframe

Answers (1)

Related Questions