Raguel
Raguel

Reputation: 625

How to input multiple images with flow_from_dataframe in keras?

I have been trying to create Siamese model for finding image similarity between 2 images (it has 2 input images). At the beginning I tested it with a small dataset, it fitted in my RAM and it worked kinda well. Now, I want to increase the training sample size and in order to do that I created images.csv file. In this file, I have 3 columns: image_1, image_2, similarity

image_1 and image_2 are absolute paths to images. similarity is either 0 or 1.

I tried

generator.flow_from_dataframe(dataframe, target_size=(64, 64, 1), x_col=['image_1', 'image_2'],
                                                    y_col='similarity',
                                                    class_mode='sparse', subset='training')

but got this error:

ValueError: All values in column x_col=['image_1', 'image_2'] must be strings.

after removing image_2 and having x_col=image_1 error disappeared but it has only 1 input image.

What should I do?

Upvotes: 4

Views: 3445

Answers (2)

Raguel
Raguel

Reputation: 625

With the help of @nuric I was able to input multiple images. Here is full code for creating flow:

def get_flow_from_dataframe(generator, dataframe,
                            image_shape=(64, 64),
                            subset='training',
                            color_mode='grayscale', batch_size=64):
    train_generator_1 = generator.flow_from_dataframe(dataframe, target_size=image_shape,
                                                      color_mode=color_mode,
                                                      x_col='image_1',
                                                      y_col='prediction',
                                                      class_mode='binary',
                                                      shuffle=True,
                                                      batch_size=batch_size,
                                                      seed=7,
                                                      subset=subset, drop_duplicates=False)

    train_generator_2 = generator.flow_from_dataframe(dataframe, target_size=image_shape,
                                                      color_mode=color_mode,
                                                      x_col='image_2',
                                                      y_col='prediction',
                                                      class_mode='binary',
                                                      shuffle=True,
                                                      batch_size=batch_size,
                                                      seed=7,
                                                      subset=subset, drop_duplicates=False)
    while True:
        x_1 = train_generator_1.next()
        x_2 = train_generator_2.next()

        yield [x_1[0], x_2[0]], x_1[1]

Full code of fit_generator:

train_gen = get_flow_from_dataframe(generator, dataframe, image_shape=(64, 64),
                                        color_mode='rgb',
                                        batch_size=batch_size)
valid_gen = get_flow_from_dataframe(generator, dataframe, image_shape=(64, 64),
                                        color_mode='rgb',
                                        batch_size=batch_size,
                                        subset='validation')

model.fit_generator(train_gen, epochs=50,
                        steps_per_epoch=step_size,
                        validation_data=valid_gen,
                        validation_steps=step_size,
                        callbacks=get_call_backs('../models/model_1.h5', monitor='val_acc'),
                        )

Also as I see memory consumption is huge.

Upvotes: 2

nuric
nuric

Reputation: 11225

You can't flow two images from a single generator using that method, it is designed to handle one, from documentation:

x_col: string, column in dataframe that contains the filenames (or absolute paths if directory is None).

Instead what you can do is create two generators and more appropriately allow your network to have two inputs:

in1 = generator.flow_from_dataframe(dataframe, target_size=(64, 64, 1), x_col='image_1',
                                                    y_col='similarity',
                                                    class_mode='sparse', subset='training')

in2 = generator.flow_from_dataframe(dataframe, target_size=(64, 64, 1), x_col='image_2',
                                                    y_col='similarity',
                                                    class_mode='sparse', subset='training')

And then build a model using functional API that accepts two image inputs:

input_image1 = Input(shape=(64, 64, 1))
input_image2 = Input(shape=(64, 64, 1))
# ... all other layers to create output_layer
model = Model([input_image1, input_image2], output)
# ...

This is more reflective of your model actually having 2 inputs as images.

Upvotes: 2

Related Questions