Fit data into machine learning keras model when data is huge

Question

In machine learning tutorials using keras, the code to train the machine learning model is this typical one-liner.

model.fit(X_train, 
          Y_train, 
          nb_epoch=5, 
          batch_size = 128, 
          verbose=1, 
          validation_split=0.1)

This seems easy when the training data X_train and Y_train is small. X_train and Y_train are numpy ndarrays. In practical situations, the training data can go into gigabytes which may be too large to be even fitted into the RAM of the computer.

How do you send data into model.fit() when the training data is too huge?

ixeption · Accepted Answer

There is a simple solution for that in Keras. You can simply use python generators, where your data is lazy loaded. If you have Images you can also use the ImageDataGenerator.

def generate_data(x, y, batch_size):    
    while True:
        batch = []
        for b in range(batch_size):
           batch.append(myDataSlice)

        yield np.array(batch )

model.fit_generator(
generator=generate_data(x, y, batch_size),
steps_per_epoch=num_batches, 
validation_data=list_batch_generator(x_val, y_val, batch_size), 
validation_steps=num_batches_test)

Fit data into machine learning keras model when data is huge

Answers (1)

Related Questions