how to work with large training set when dealing with auto-encoders on google colaboratory?

I am training an auto-encoder (keras) on google colab. however, I have 25000 input image and 25000 output image. I tried to: 1- copy the large file from google drive to colab each time (takes 5-6 hours). 2- convert the set to numpy array but when normalizing the images, the size get a lot bigger (from 7GB to 24GB for example) and then I can not fit it into the ram memory. 3- I can not zip and unzip my data. So please, if anyone knows how to convert it into numpy array( and normalize it) without having large file(24GB).

Upvotes: 0

Answers (1)

B Douchet

Reputation: 1020

What I usually do :

Zip all the images and load the .zip file on your Google Drive
Dezip in your colab :

from zipfile import ZipFile

with ZipFile('data.zip', 'r') as zip:
   zip.extractall()

All your images are dezipped and stored on the Colab Disk, now you can have a faster acces to them.
Use Generators in keras like flow_from_directory or create your own generator
use you generator when you fit your model :

moel.fit(train_generator, steps_per_epoch = ntrain // batch_size,
         epochs=epochs,validation_data=val_generator, 
         validation_steps= nval // batch_size)

with ntrain and nval the number of images in your train and validation dataset

Upvotes: 1

how to work with large training set when dealing with auto-encoders on google colaboratory?

Answers (1)

Related Questions