Nuno Pessanha Santos
Nuno Pessanha Santos

Reputation: 313

How do I use a Large dataset to train an Autoencoder using Keras?

I have a large image dataset, and I cannot load it directly into memory. I am trying to explore some alternatives, but I have been unable to succeed until now.

//I am only representing the training dataset
train_image_folder_X = keras.utils.image_dataset_from_directory(
    directory=train_image_folder,
    labels=None,
    label_mode=None,
    batch_size=batch_size_1,
    image_size=(image_size_, image_size_))

train_image_folder_Y = keras.utils.image_dataset_from_directory(
    directory=train_annot_folder,
    labels=None,
    label_mode=None,
    batch_size=batch_size_1,
    image_size=(image_size_, image_size_))

//I have already my model
autoencoder = keras.models.load_model("Model")

autoencoder.fit(
    x=X_train,
    y=Y_train,
    epochs=epochs_number,
    batch_size=batch_size_number,
    #shuffle=True,
    validation_data=(X_val, Y_val)
)

I have already tried to map the input but it does not train in the entire dataset:

X_train = train_image_folder_X.map(lambda x: x, num_parallel_calls = tf.data.AUTOTUNE)
X_train = next(iter(X_train))

I have 328814 images for input and output & for validation I have 32881 for input and output.

With:

batch_size_1 = 128
batch_size_number = 10
epochs_number = 1

Results

I have also explored some blogs and other alternatives but have been unsuccessful.

How can I train a large dataset using Keras? What I am missing here?

Upvotes: -1

Views: 334

Answers (1)

Nuno Pessanha Santos
Nuno Pessanha Santos

Reputation: 313

I have tested

flow_from_directory()

Using "ZIP" I was able to solve the problem:

test_datagen_train_X = ImageDataGenerator()

test_generator_X_Train = test_datagen_train_X.flow_from_directory(
train_image_folder_X,
target_size=(image_size_w, image_size_h),
batch_size=batch_size_train,
class_mode=None,
shuffle=False,
seed=10)
(.........)
(.........)
train_dataset = zip(test_generator_X_Train, test_generator_Y_Train)
validation_dataset = zip(test_generator_val_X,test_generator_val_Y)
(.........)
(.........)
autoencoder.fit(train_dataset, steps_per_epoch = (len(test_generator_X_Train)/batch_size_number),epochs = 1, validation_data = validation_dataset, validation_steps=(len(test_generator_val_X)/batch_size_number), callbacks=[csv_logger] )

With this sequence I am starting to train my Autoencoder using a large dataset

Upvotes: 2

Related Questions