Reputation: 280
I am using the 'Dogs vs. Cats Redux: Kernels Edition' dataset from kaggle for a deep learning model.
import os
from getpass import getpass
user = getpass('Kaggle Username: ')
key = getpass('Kaggle API key: ')
if '.kaggle' not in os.listdir('/root'):
!mkdir ~/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 666 /root/.kaggle/kaggle.json
with open('/root/.kaggle/kaggle.json', 'w') as f:
f.write('{"username":"%s","key":"%s"}' % (user, key))
!kaggle competitions download -c dogs-vs-cats-redux-kernels-edition
I have downloaded it in my colab notebook environment, the total dataset size(test+train) is of approximately greater than 800mbs.
ls -sh
112K sample_submission.csv 272M test.zip 544M train.zip
However, when I extract the train and test zip, why is the size of the extracted file so less?
unzip test.zip && unzip train.zip
ls -sh
total 816M
112K sample_submission.csv 272M test.zip 544M train.zip
276K test 752K train
The unzip happens without the quiet mode so I can see the files are getting extracted one by one
Also I can see the images inside the test
folder which are completely accessible through the side directory
I thought this was some size display bug by the ls
command and the files are really extracted, but when running the training code, it throws error related to images not found.
I unzipped some files by uploading a small dataset locally and they are working fine, so unzip
is working fine too, same is the case with 7z and python unzipping.
Any approach towards the problem or alternate solution would be helpful.
Upvotes: 0
Views: 688
Reputation: 10717
You're looking at the size of the directory instead of the size of its contents.
Try checking the size with du
instead.
Upvotes: 1