Srinivasa Rao
Srinivasa Rao

Reputation: 161

interface between google colaboratory and google cloud

From google colaboratory, if I want to read/write to a folder in a given bucket created in google cloud, how do I achieve this?

I have created a bucket, a folder within the bucket and uploaded bunch of images into it. Now from colaboratory, using jupyter notebook, want to create multiple sub-directories to organise these images into train, validation and test folders.

Subsequently access respective folders for training, validating and testing the model.

With Google drive, we just update the path to direct to specific directory with following commands, after authentication.

import sys
sys.path.append('drive/xyz')

We do some thing similar on desktop version also

import os
os.chdir(local_path)

Does some thing similar exist for Google Cloud Storage?

I colaboratory FAQs, it has procedure for reading and writing a single file, where we need to set the entire path. That will be tedious to re-organise a main directory into sub-directories and access them separately.

Upvotes: 6

Views: 6362

Answers (1)

Dan Cornilescu
Dan Cornilescu

Reputation: 39824

In general it's not a good idea to try to mount a GCS bucket on the local machine (which would allow you to use it as you mentioned). From Connecting to Cloud Storage buckets:

Note: Cloud Storage is an object storage system that does not have the same write constraints as a POSIX file system. If you write data to a file in Cloud Storage simultaneously from multiple sources, you might unintentionally overwrite critical data.

Assuming you'd like to continue regardless of the warning, if you use a Linux OS you may be able to mount it using the Cloud Storage FUSE adapter. See related How to mount Google Bucket as local disk on Linux instance with full access rights.

The recommended way to access GCS from python apps is using the Cloud Storage Client Libraries, but accessing files will be different than in your snippets. You can find some examples at Python Client for Google Cloud Storage:

from google.cloud import storage
client = storage.Client()
# https://console.cloud.google.com/storage/browser/[bucket-id]/
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')

Update:

The Colaboratory doc recommends another method that I forgot about, based on the Google API Client Library for Python, but note that it also doesn't operate like a regular filesystem, it's using an intermediate file on the local filesystem:

Upvotes: 8

Related Questions