Keras: keep all images in a single directory

Question

I work with many images (10M+) stored in a single directory (no subfolders for each class) and use a pandas DataFrame to keep track of class label. The amount of images do not fit in memory so I must read minibatches from disk. So far I have used Keras .flow_from_directory(), but it requires me to move images to one subfolder per class (and per train/validation split). It works great, but it just becomes very unpractical when I want to use differnt subsets of images and define classes in various ways. Do anyone have an alternative strategy that usees a database (e.g. pandas.DataFrame) to keep track of the reading of minibatches instead of moving images to subfolders?

Ioannis Nasios · Accepted Answer

You need a custom data generator.

import numpy as np
import cv2
def batch_generator(ids):
    while True:
        for start in range(0, len(ids), batch_size):
            x_batch = []
            y_batch = []
            end = min(start + batch_size, len(ids))
            ids_batch = ids[start:end]
            for id in ids_batch:
                img = cv2.imread(dpath+'train/{}.jpg'.format(id))
                #img = cv2.resize(img, (224, 224), interpolation = cv2.INTER_AREA)
                labelname=df_train.loc[df_train.id==id,'column_name'].values
                labelnum=classes.index(labelname)
                x_batch.append(img)
                y_batch.append(labelnum)
            x_batch = np.array(x_batch, np.float32) 
            y_batch = to_categorical(y_batch,120) 
            yield x_batch, y_batch

Then you can call the generator only with ids (or image names) numpy array like this:

model.fit_generator(generator=batch_generator(ids_train_split), \
               steps_per_epoch= \ 
               np.ceil(float(len(ids_train_split)) / float(batch_size)),\
                epochs=epochs, verbose=1, callbacks=callbacks, \
                validation_data=batch_generator(ids_valid_split), \
                validation_steps=np.ceil(float(len(ids_valid_split)) / float(batch_size)))

Keras: keep all images in a single directory

Answers (1)

Related Questions