Reputation: 2216
I already have a zip of (2K images) dataset on a google drive. I have to use it in a ML training algorithm. Below Code extracts the content in a string format:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import io
import zipfile
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1T80o3Jh3tHPO7hI5FBxcX-jFnxEuUE9K' #-- Updated File ID for my zip
downloaded = drive.CreateFile({'id': file_id})
#print('Downloaded content "{}"'.format(downloaded.GetContentString(encoding='cp862')))
But I have to extract and store it in a separate directory as it would be easier for processing (as well as for understanding) of the dataset.
I tried to extract it further, but getting "Not a zipfile error"
dataset = io.BytesIO(downloaded.encode('cp862'))
zip_ref = zipfile.ZipFile(dataset, "r")
zip_ref.extractall()
zip_ref.close()
Note: Dataset is just for reference, I have already downloaded this zip to my google drive, and I'm referring to file in my drive only.
Upvotes: 75
Views: 400775
Reputation: 1448
If you are dealing with a zip file, like for me it is mostly thousands of images and I want to store them in a folder within the drive, then do this --
!unzip -u "/content/drive/My Drive/folder/example.zip" -d "/content/drive/My Drive/folder/NewFolder"
-u
part controls extraction only if new/necessary. It is important if suddenly you lose connection or hardware switches off.
-d
creates the directory and extracted files are stored there.
Of course before doing this you need to mount your drive
from google.colab import drive
drive.mount('/content/drive')
I hope this helps! Cheers!!
Upvotes: 24
Reputation: 77
This is what worked for me.
!apt install unzip
Then I used this code to unzip the file
!unzip /content/file.zip -d /content/
Without installing unzip
on Colab first you'll always receive error messages.
Upvotes: 3
Reputation: 187
Please use this command in google colab
Unzip the file you want to extract and then the location
!unzip "drive/My Drive/Project/yourfilename.zip" -d "drive/My Drive/Project/yourfolder"
Upvotes: 9
Reputation: 21
We assumed that you're already mounting your googleDrive to googleColab. in case if you just want to extract zip file thats contains .csv extention. just call pandas attribute read_csv
pd.read_csv('/content/drive/My Drive/folder/example.zip')
Upvotes: 0
Reputation: 15
in my idea, you must go to a certain path for example:
from google.colab import drive drive.mount('/content/drive/') cd drive/MyDrive/f/
then :
!apt install unzip !unzip zip_folder.zip -d unzip_folder enter image description here
Upvotes: -2
Reputation: 1147
Try this:
!unpack file.zip
If its now working or file is 7z try below
!apt-get install p7zip-full
!p7zip -d file_name.tar.7z
!tar -xvf file_name.tar
Or
!pip install pyunpack
!pip install patool
from pyunpack import Archive
Archive(‘file_name.tar.7z’).extractall(‘path/to/’)
!tar -xvf file_name.tar
Upvotes: 0
Reputation: 534
For Python
Connect to drive,
from google.colab import drive
drive.mount('/content/drive')
Check for directory
!ls
and !pwd
For unzip
!unzip drive/"My Drive"/images.zip
Upvotes: 5
Reputation: 335
After mounting on drive, use shutil.unpack_archive. It works with almost all archive formats (e.g., “zip”, “tar”, “gztar”, “bztar”, “xztar”) and it's simple:
import shutil
shutil.unpack_archive("filename", "path_to_extract")
Upvotes: 4
Reputation: 3258
TO unzip a file to a directory:
!unzip path_to_file.zip -d path_to_directory
Upvotes: 103
Reputation: 3033
Mount GDrive:
from google.colab import drive
drive.mount('/content/gdrive')
Open the link -> copy authorization code -> paste that into the prompt and press "Enter"
Check GDrive access:
!ls "/content/gdrive/My Drive"
Unzip (q stands for "quiet") file from GDrive:
!unzip -q "/content/gdrive/My Drive/dataset.zip"
Upvotes: 7
Reputation: 1033
First create a new directory:
!mkdir file_destination
Now, it's the time to inflate the directory with the unzipped files with this:
!unzip file_location -d file_destination
Upvotes: 6
Reputation: 91
First, install unzip on colab:
!apt install unzip
then use unzip to extract your files:
!unzip source.zip -d destination.zip
Upvotes: 9
Reputation: 1389
To extract Google Drive zip from a Google colab notebook:
import zipfile
from google.colab import drive
drive.mount('/content/drive/')
zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()
Upvotes: 30
Reputation: 129
SIMPLE WAY TO CONNECT
1) You'll have to verify authentication
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
2)To fuse google drive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
3)To verify credentials
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
4)Create a drive name to use it in colab ('gdrive') and check if it's working
!mkdir gdrive
!google-drive-ocamlfuse gdrive
!ls gdrive
!cd gdrive
Upvotes: 1
Reputation: 40808
Instead of GetContentString()
, use GetContentFile() instead. It will save the file instead of returning the string.
downloaded.GetContentFile('images.zip')
Then you can unzip it later with unzip
.
Upvotes: 1