Reputation: 313
I have a large image dataset, and I cannot load it directly into memory. I am trying to explore some alternatives, but I have been unable to succeed until now.
//I am only representing the training dataset
train_image_folder_X = keras.utils.image_dataset_from_directory(
directory=train_image_folder,
labels=None,
label_mode=None,
batch_size=batch_size_1,
image_size=(image_size_, image_size_))
train_image_folder_Y = keras.utils.image_dataset_from_directory(
directory=train_annot_folder,
labels=None,
label_mode=None,
batch_size=batch_size_1,
image_size=(image_size_, image_size_))
//I have already my model
autoencoder = keras.models.load_model("Model")
autoencoder.fit(
x=X_train,
y=Y_train,
epochs=epochs_number,
batch_size=batch_size_number,
#shuffle=True,
validation_data=(X_val, Y_val)
)
I have already tried to map the input but it does not train in the entire dataset:
X_train = train_image_folder_X.map(lambda x: x, num_parallel_calls = tf.data.AUTOTUNE)
X_train = next(iter(X_train))
I have 328814 images for input and output & for validation I have 32881 for input and output.
With:
batch_size_1 = 128
batch_size_number = 10
epochs_number = 1
I have also explored some blogs and other alternatives but have been unsuccessful.
How can I train a large dataset using Keras? What I am missing here?
Upvotes: -1
Views: 334
Reputation: 313
I have tested
flow_from_directory()
Using "ZIP" I was able to solve the problem:
test_datagen_train_X = ImageDataGenerator()
test_generator_X_Train = test_datagen_train_X.flow_from_directory(
train_image_folder_X,
target_size=(image_size_w, image_size_h),
batch_size=batch_size_train,
class_mode=None,
shuffle=False,
seed=10)
(.........)
(.........)
train_dataset = zip(test_generator_X_Train, test_generator_Y_Train)
validation_dataset = zip(test_generator_val_X,test_generator_val_Y)
(.........)
(.........)
autoencoder.fit(train_dataset, steps_per_epoch = (len(test_generator_X_Train)/batch_size_number),epochs = 1, validation_data = validation_dataset, validation_steps=(len(test_generator_val_X)/batch_size_number), callbacks=[csv_logger] )
With this sequence I am starting to train my Autoencoder using a large dataset
Upvotes: 2