Reputation: 2998
I have some data on gDrive, for example at
/projects/my_project/my_data*
.
Also I have a simple notebook in gColab.
So, I would like to do something like:
for file in glob.glob("/projects/my_project/my_data*"):
do_something(file)
Unfortunately, all examples (like this - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb, for example) suggests to only mainly load all necessary data to notebook.
But, if I have a lot of pieces of data, it can be quite complicated. How to solve this issue?
Upvotes: 275
Views: 849355
Reputation: 2601
Read images from google drive using colab notebook:
import glob
images_list = glob.glob("add google drive path/*.jpg")
print(images_list)
Create training.txt file, required for YOLOv4 training:
file = open("/content/drive/MyDrive/project data/obj/train.txt", "w")
file.write("\n".join(images_list))
file.close()
Upvotes: 0
Reputation: 52516
27/12/2022 Vy update:
from google.colab import drive
drive.mount('/content/gdrive/')
Upvotes: 6
Reputation: 2020
Consider just downloading the file with permanent link and gdown
preinstalled like here
Upvotes: 0
Reputation: 1628
What I have done is first:
from google.colab import drive
drive.mount('/content/drive/')
Then
%cd /content/drive/My Drive/Colab Notebooks/
After I can for example read csv files with
df = pd.read_csv("data_example.csv")
If you have different locations for the files just add the correct path after My Drive
Upvotes: 95
Reputation: 344
from google.colab import drive
drive.mount('/content/drive')
This worked perfect for me
I was later able to use the os
library to access my files just like how I access them on my PC
Upvotes: 5
Reputation: 978
To read all files in a folder:
import glob
from google.colab import drive
drive.mount('/gdrive', force_remount=True)
#!ls "/gdrive/My Drive/folder"
files = glob.glob(f"/gdrive/My Drive/folder/*.txt")
for file in files:
do_something(file)
Upvotes: 5
Reputation: 38579
Edit: As of February, 2020, there's now a first-class UI for automatically mounting Drive.
First, open the file browser on the left hand side. It will show a 'Mount Drive' button. Once clicked, you'll see a permissions prompt to mount Drive, and afterwards your Drive files will be present with no setup when you return to the notebook. The completed flow looks like so:
The original answer follows, below. (This will also still work for shared notebooks.)
You can mount your Google Drive files by running the following code snippet:
from google.colab import drive
drive.mount('/content/drive')
Then, you can interact with your Drive files in the file browser side panel or using command-line utilities.
Upvotes: 547
Reputation: 1399
To extract Google Drive zip from a Google colab notebook for example:
import zipfile
from google.colab import drive
drive.mount('/content/drive/')
zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()
Upvotes: 1
Reputation: 2487
Most of the previous answers are a bit(Very) complicated,
from google.colab import drive
drive.mount("/content/drive", force_remount=True)
I figured out this to be the easiest and fastest way to mount google drive into CO Lab, You can change the mount directory location
to what ever you want by just changing the parameter for drive.mount
. It will give you a link to accept the permissions with your account and then you have to copy paste the key generated and then drive will be mounted in the selected path.
force_remount
is used only when you have to mount the drive irrespective of whether its loaded previously.You can neglect this when parameter if you don't want to force mount
Edit: Check this out to find more ways of doing the IO
operations in colab https://colab.research.google.com/notebooks/io.ipynb
Upvotes: 19
Reputation: 546
I wrote a class that downloads all of the data to the '.' location in the colab server
The whole thing can be pulled from here https://github.com/brianmanderson/Copy-Shared-Google-to-Colab
!pip install PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import os
class download_data_from_folder(object):
def __init__(self,path):
path_id = path[path.find('id=')+3:]
self.file_list = self.get_files_in_location(path_id)
self.unwrap_data(self.file_list)
def get_files_in_location(self,folder_id):
file_list = drive.ListFile({'q': "'{}' in parents and trashed=false".format(folder_id)}).GetList()
return file_list
def unwrap_data(self,file_list,directory='.'):
for i, file in enumerate(file_list):
print(str((i + 1) / len(file_list) * 100) + '% done copying')
if file['mimeType'].find('folder') != -1:
if not os.path.exists(os.path.join(directory, file['title'])):
os.makedirs(os.path.join(directory, file['title']))
print('Copying folder ' + os.path.join(directory, file['title']))
self.unwrap_data(self.get_files_in_location(file['id']), os.path.join(directory, file['title']))
else:
if not os.path.exists(os.path.join(directory, file['title'])):
downloaded = drive.CreateFile({'id': file['id']})
downloaded.GetContentFile(os.path.join(directory, file['title']))
return None
data_path = 'shared_path_location'
download_data_from_folder(data_path)
Upvotes: 2
Reputation: 2537
I’m lazy and my memory is bad, so I decided to create easycolab which is easier to memorize and type:
import easycolab as ec
ec.mount()
Make sure to install it first: !pip install easycolab
The mount()
method basically implement this:
from google.colab import drive
drive.mount(‘/content/drive’)
cd ‘/content/gdrive/My Drive/’
Upvotes: 5
Reputation: 3430
There are many ways to read the files in your colab notebook(**.ipnb), a few are:
Method 1 and 2 worked for me, rest I wasn't able to figure out. If anyone could, as others tried in above post please write an elegant answer. thanks in advance.!
First method:
I wasn't able to mount my google drive, so I installed these libraries
# Install a Drive FUSE wrapper.
# https://github.com/astrada/google-drive-ocamlfuse
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}
Once the installation & authorization process is finished, you first mount your drive.
!mkdir -p drive
!google-drive-ocamlfuse drive
After installation I was able to mount the google drive, everything in your google drive starts from /content/drive
!ls /content/drive/ML/../../../../path_to_your_folder/
Now you can simply read the file from path_to_your_folder
folder into pandas using the above path.
import pandas as pd
df = pd.read_json('drive/ML/../../../../path_to_your_folder/file.json')
df.head(5)
you are suppose you use absolute path you received & not using /../..
Second method:
Which is convenient, if your file which you want to read it is present in the current working directory.
If you need to upload any files from your local file system, you could use below code, else just avoid it.!
from google.colab import files
uploaded = files.upload()
for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
suppose you have below the folder hierarchy in your google drive:
/content/drive/ML/../../../../path_to_your_folder/
Then, you simply need below code to load into pandas.
import pandas as pd
import io
df = pd.read_json(io.StringIO(uploaded['file.json'].decode('utf-8')))
df
Upvotes: 2
Reputation: 7779
You can't permanently store a file on colab. Though you can import files from your drive and everytime when you are done with file you can save it back.
To mount the google drive to your Colab session
from google.colab import drive
drive.mount('/content/gdrive')
you can simply write to google drive as you would to a local file system Now if you see your google drive will be loaded in the Files tab. Now you can access any file from your colab, you can write as well as read from it. The changes will be done real time on your drive and anyone having the access link to your file can view the changes made by you from your colab.
Example
with open('/content/gdrive/My Drive/filename.txt', 'w') as f:
f.write('values')
Upvotes: 15
Reputation: 509
Thanks for the great answers! Fastest way to get a few one-off files to Colab from Google drive: Load the Drive helper and mount
from google.colab import drive
This will prompt for authorization.
drive.mount('/content/drive')
Open the link in a new tab-> you will get a code - copy that back into the prompt you now have access to google drive check:
!ls "/content/drive/My Drive"
then copy file(s) as needed:
!cp "/content/drive/My Drive/xy.py" "xy.py"
confirm that files were copied:
!ls
Upvotes: 50
Reputation: 29
You can simply make use of the code snippets on the left of the screen. enter image description here
Insert "Mounting Google Drive in your VM"
run the code and copy&paste the code in the URL
and then use !ls to check the directories
!ls /gdrive
for most cases, you will find what you want in the directory "/gdrive/My drive"
then you may carry it out like this:
from google.colab import drive
drive.mount('/gdrive')
import glob
file_path = glob.glob("/gdrive/My Drive/***.txt")
for file in file_path:
do_something(file)
Upvotes: 1
Reputation: 2998
@wenkesj
I am speaking about copy the directory and all it subdirectories.
For me, I found a solution, that looks like this:
def copy_directory(source_id, local_target):
try:
os.makedirs(local_target)
except:
pass
file_list = drive.ListFile(
{'q': "'{source_id}' in parents".format(source_id=source_id)}).GetList()
for f in file_list:
key in ['title', 'id', 'mimeType']]))
if f["title"].startswith("."):
continue
fname = os.path.join(local_target, f['title'])
if f['mimeType'] == 'application/vnd.google-apps.folder':
copy_directory(f['id'], fname)
else:
f_ = drive.CreateFile({'id': f['id']})
f_.GetContentFile(fname)
Nevertheless, I looks like gDrive don't like to copy too much files.
Upvotes: 0
Reputation: 988
Good news, PyDrive has first class support on CoLab! PyDrive is a wrapper for the Google Drive python client. Here is an example on how you would download ALL files from a folder, similar to using glob
+ *
:
!pip install -U -q PyDrive
import os
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# choose a local (colab) directory to store the data.
local_download_path = os.path.expanduser('~/data')
try:
os.makedirs(local_download_path)
except: pass
# 2. Auto-iterate using the query syntax
# https://developers.google.com/drive/v2/web/search-parameters
file_list = drive.ListFile(
{'q': "'1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk' in parents"}).GetList()
for f in file_list:
# 3. Create & download by id.
print('title: %s, id: %s' % (f['title'], f['id']))
fname = os.path.join(local_download_path, f['title'])
print('downloading to {}'.format(fname))
f_ = drive.CreateFile({'id': f['id']})
f_.GetContentFile(fname)
with open(fname, 'r') as f:
print(f.read())
Notice that the arguments to drive.ListFile
is a dictionary that coincides with the parameters used by Google Drive HTTP API (you can customize the q
parameter to be tuned to your use-case).
Know that in all cases, files/folders are encoded by id's (peep the 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk) on Google Drive. This requires that you search Google Drive for the specific id corresponding to the folder you want to root your search in.
For example, navigate to the folder "/projects/my_project/my_data"
that
is located in your Google Drive.
See that it contains some files, in which we want to download to CoLab. To get the id of the folder in order to use it by PyDrive, look at the url and extract the id parameter. In this case, the url corresponding to the folder was:
Where the id is the last piece of the url: 1SooKSw8M4ACbznKjnNrYvJ5wxuqJ-YCk.
Upvotes: 80