How to input multiple images with flow_from_dataframe in keras?

Question

I have been trying to create Siamese model for finding image similarity between 2 images (it has 2 input images). At the beginning I tested it with a small dataset, it fitted in my RAM and it worked kinda well. Now, I want to increase the training sample size and in order to do that I created images.csv file. In this file, I have 3 columns: image_1, image_2, similarity

image_1 and image_2 are absolute paths to images. similarity is either 0 or 1.

I tried

generator.flow_from_dataframe(dataframe, target_size=(64, 64, 1), x_col=['image_1', 'image_2'],
                                                    y_col='similarity',
                                                    class_mode='sparse', subset='training')

but got this error:

ValueError: All values in column x_col=['image_1', 'image_2'] must be strings.

after removing image_2 and having x_col=image_1 error disappeared but it has only 1 input image.

What should I do?

Raguel · Accepted Answer

With the help of @nuric I was able to input multiple images. Here is full code for creating flow:

def get_flow_from_dataframe(generator, dataframe,
                            image_shape=(64, 64),
                            subset='training',
                            color_mode='grayscale', batch_size=64):
    train_generator_1 = generator.flow_from_dataframe(dataframe, target_size=image_shape,
                                                      color_mode=color_mode,
                                                      x_col='image_1',
                                                      y_col='prediction',
                                                      class_mode='binary',
                                                      shuffle=True,
                                                      batch_size=batch_size,
                                                      seed=7,
                                                      subset=subset, drop_duplicates=False)

    train_generator_2 = generator.flow_from_dataframe(dataframe, target_size=image_shape,
                                                      color_mode=color_mode,
                                                      x_col='image_2',
                                                      y_col='prediction',
                                                      class_mode='binary',
                                                      shuffle=True,
                                                      batch_size=batch_size,
                                                      seed=7,
                                                      subset=subset, drop_duplicates=False)
    while True:
        x_1 = train_generator_1.next()
        x_2 = train_generator_2.next()

        yield [x_1[0], x_2[0]], x_1[1]

Full code of fit_generator:

train_gen = get_flow_from_dataframe(generator, dataframe, image_shape=(64, 64),
                                        color_mode='rgb',
                                        batch_size=batch_size)
valid_gen = get_flow_from_dataframe(generator, dataframe, image_shape=(64, 64),
                                        color_mode='rgb',
                                        batch_size=batch_size,
                                        subset='validation')

model.fit_generator(train_gen, epochs=50,
                        steps_per_epoch=step_size,
                        validation_data=valid_gen,
                        validation_steps=step_size,
                        callbacks=get_call_backs('../models/model_1.h5', monitor='val_acc'),
                        )

Also as I see memory consumption is huge.

How to input multiple images with flow_from_dataframe in keras?

Answers (2)

Related Questions