James
James

Reputation: 307

How to correctly implement Keras's fit_generator on multiple datasets?

I am having a problem implementing Keras's fit_generator function. I have followed the Keras documentation and numerous other documentation online. But I can't seem to get this thing to work.

When I run the fit_generator, it is not throwing an error. I could tell that something is running in the background since my GPU usage on my task manager skyrockets to 70% processing. However, there is no text/verbose that says that the batches are being processed for my convolutional neural network.

This is my model

import keras
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential

model = Sequential()

model.add(Conv2D(filters=80, kernel_size=4, strides=1, activation='relu', input_shape=(180, 180, 3)))
model.add(Dropout(rate = 0.2))
model.add(MaxPooling2D(pool_size=2, strides=2))
model.add(Conv2D(filters=60, kernel_size=2, strides=1, activation='relu'))
model.add(Dropout(rate = 0.2))
model.add(MaxPooling2D(pool_size=2, strides=2))
model.add(Dense(units = 40, activation = 'relu'))
model.add(Dense(units = 20, activation = 'relu'))
model.add(Flatten())
model.add(Dense(units=5270, activation='softmax'))

model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
model.summary()

This is my batch generator

I have six hdf5 files that I want to loop through that each contain 40,000 images. They are already formatted as Numpy arrays. I am yielding a batch size of 20 each time.

def train_generator():
    counter = 1
    batch_size = 20

    while True:

        # Create arrays to contain x_train and y_train. There are six of these files in total, so 40000*6 = 240,000 items in the entire training set.
        # 240,000 images for each epoch
        h5f = h5py.File('x_train' + str(counter) + 'catID.h5','r')
        pic_arr = h5f['dataset'][0:40000]

        h5f = h5py.File('y_train' + str(counter) + 'catID.h5','r')
        cat_arr = h5f['dataset'][0:40000]
        h5f.close()

        # Since training size for first dataset is 40,000 and batch_size is 20, loop 2000 times because 40000/20 = 2000 
        for i in range(1,2001):
            if (i == 1):
                x_train = pic_arr[0:batch_size]
                y_train = cat_arr[0:batch_size]

                index = batch_size
                yield (x_train, y_train)
            else:
                x_train = pic_arr[index:index + batch_size]
                y_train = cat_arr[index:index + batch_size]

                index += batch_size
                yield (x_train, y_train)

        del pic_arr
        del cat_arr
        counter += 1

Fitting my model

When fitting my model with my generator, I know that my GPU is processing the data; I've got an NVIDIA GTX 1070. But there is no verbose/text displayed when running this code below. I also tried running without GPU, but still no luck. Is there something I'm doing wrong here?

from keras.callbacks import ModelCheckpoint
import tensorflow as tf

# This is used to store the best weights for our trained model.
checkpointer = ModelCheckpoint(filepath='weights_bestcatID.hdf5', 
                           verbose=1, save_best_only=True)

# steps_per_epoch=12000 because --> 240,000 (total samples) / 20 (batch size) = 12000 
with tf.device('/device:GPU:0'):
    model.fit_generator(train_generator(), steps_per_epoch=12000, nb_epoch=4, verbose = 1, callbacks=[checkpointer])

Upvotes: 2

Views: 1120

Answers (1)

James
James

Reputation: 307

Nevermind. I tried running this same code again and it worked... If anyone needs to reference how to implement Keras's fit_generator, the above works.

Upvotes: 1

Related Questions