Zenquiorra
Zenquiorra

Reputation: 280

Why is my large file when extracted is of very less size on google colab?

I am using the 'Dogs vs. Cats Redux: Kernels Edition' dataset from kaggle for a deep learning model.

import os
from getpass import getpass
user = getpass('Kaggle Username: ')
key = getpass('Kaggle API key: ')

if '.kaggle' not in os.listdir('/root'):
    !mkdir ~/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 666 /root/.kaggle/kaggle.json
with open('/root/.kaggle/kaggle.json', 'w') as f:
    f.write('{"username":"%s","key":"%s"}' % (user, key))


!kaggle competitions download -c dogs-vs-cats-redux-kernels-edition

I have downloaded it in my colab notebook environment, the total dataset size(test+train) is of approximately greater than 800mbs.

ls -sh
    112K sample_submission.csv  272M test.zip  544M train.zip

However, when I extract the train and test zip, why is the size of the extracted file so less?

unzip test.zip && unzip train.zip
ls -sh
    total 816M
    112K sample_submission.csv  272M test.zip  544M train.zip
    276K test           752K train

The unzip happens without the quiet mode so I can see the files are getting extracted one by one

Also I can see the images inside the test folder which are completely accessible through the side directory

I thought this was some size display bug by the ls command and the files are really extracted, but when running the training code, it throws error related to images not found.

I unzipped some files by uploading a small dataset locally and they are working fine, so unzip is working fine too, same is the case with 7z and python unzipping.

Any approach towards the problem or alternate solution would be helpful.

Upvotes: 0

Views: 688

Answers (1)

cha0site
cha0site

Reputation: 10717

You're looking at the size of the directory instead of the size of its contents.

Try checking the size with du instead.

Upvotes: 1

Related Questions