Reputation: 567
I've read dozens of questions about unzipping files from Google Colab. My question is different, you will read why.
I need to unzip a zip file on Google Colab in order to perform some computation on the images in it. The problem is that all the different utilities that I've used don't recognize the zip file as a zip file.
images.zip
images.zip
on GDrive and share it copying its linkimages.zip
:import urllib
import os
drive_url = 'the_link_to_the_zip_file'
file_name = 'images.zip'
urllib.request.urlretrieve(drive_url, file_name)
os.listdir()
Obtaining: ['.config', 'images.zip', 'drive', 'sample_data']
, so the file was successfully downloaded.
Now I would like to unzip it.
Using zipfile
import zipfile
zip_ref = zipfile.ZipFile("images.zip", "r")
zip_ref.extractall()
zip_ref.close()
The error that I get:
BadZipFile Traceback (most recent call last)
<ipython-input-41-eca398f38f4a> in <module>()
----> 1 zip_ref = zipfile.ZipFile("xyz.zip", "r")
2 zip_ref.extractall()
3 zip_ref.close()
1 frames
/usr/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
1129 try:
1130 if mode == 'r':
-> 1131 self._RealGetContents()
1132 elif mode in ('w', 'x'):
1133 # set the modified flag so central directory gets written
/usr/lib/python3.6/zipfile.py in _RealGetContents(self)
1196 raise BadZipFile("File is not a zip file")
1197 if not endrec:
-> 1198 raise BadZipFile("File is not a zip file")
1199 if self.debug > 1:
1200 print(endrec)
BadZipFile: File is not a zip file
Using unzip
!unzip -uq "images.zip" -d "/content/drive/My Drive/Test"
The error that I get:
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of images.zip or
images.zip.zip, and cannot find images.zip.ZIP, period.
Obviously images.zip
is a perfectly fine zip file, that I can open and modify both on my computer and online using Google Drive.
Note: I obtain the same result also uploading a zip file that I've created on my computer. Initially I was thinking that maybe my zip utility was broken, but now what it seems broken is Google Colab...
Note2: The solution isn't just access directly to the file images.zip in Drive and unzip from there because it could happen that I need to download locally a zip from someone else Drive
Many thanks
Upvotes: 3
Views: 14323
Reputation: 600
I think I got what the problem is. Looks like the file you are tying to extract is not a zip. Try this to verify if it is really a zipfile.
!apt install file
!file <location_of_zip_file>
I suspect that the file you downloaded is not zipfile because you might not have provided the direct URL to the file.
Upvotes: 1