RichHacker
RichHacker

Reputation: 41

Can't load load my dataset to train my model on Google Colab

I am currently facing the problems of dealing with a large dataset, I can not download the dataset directly into google colab due to the limited space google colab provides(37 GB) I have done some research and it seems that it depends on the GPU we get assigned, for some people the available space on the disk could be more. So my question is, can I download the dataset on a server such as Google Cloud on then load it from the server. The dataset is roughly 20 GB, the reason why 37 GB is not enough is that when u download a zip file it will require to extract the files so an additional 20GB will be required, but if I download and extract the file on the server, I would only use 20 GB on google colab, any other suggestion is welcome, my end goal is to find a solution to have a model trained on the coco dataset.

Upvotes: 0

Views: 1227

Answers (1)

Viraf
Viraf

Reputation: 141

One more approach could be uploading just the annotations file to Google Colab. There's no need to download the image dataset. We will make use of the PyCoco API. Next, when preparing an image, instead of accessing the image file from Drive / local folder, you can read the image file with the URL!

# The normal method. Read from folder / Drive
I = io.imread('%s/images/%s/%s'%(dataDir,dataType,img['file_name']))

# Instead, use this! Url to load image
I = io.imread(img['coco_url'])

This method will save you plenty of space, download time and effort. However, you'll require a working internet connection during training to fetch the images (which of course you have, since you are using colab).

If you are interested in exploring the COCO dataset more, you can have a look at my post on medium.

Upvotes: 2

Related Questions