ct_sphon
ct_sphon

Reputation: 63

Accessing data in Google Cloud bucket for a python Tensorflow learning program

I’m working through the Google quick start examples for Cloud Learning / Tensorflow as shown here: https://cloud.google.com/ml/docs/quickstarts/training

I want my python program to access data that I have stored in a Google Cloud bucket such as gs://mybucket. How do I do this inside of my python program instead of calling it from the command line?

Specifically, the quickstart example for cloud learning utilizes data they provided but what if I want to provide my own data that I have stored in a bucket such as gs://mybucket?

I noticed a similar post here: How can I get the Cloud ML service account programmatically in Python? ... but I can’t seem to install the googleapiclient module.

Some posts seem to mention Apache Beam though I can’t tell if that’s relevant to me, but besides I can’t figure out how to download or install that whatever it is.

Upvotes: 1

Views: 13511

Answers (2)

Yogesh Awdhut Gadade
Yogesh Awdhut Gadade

Reputation: 2708

Assuming you are using Ubuntu/Linux as an OS and already having data in GCS bucket Execute following commands from a terminal or can be executed on Jupyter Notebook(just use ! before commands):

--------------------- Installation -----------------

1st install storage module: on Terminal type:

pip install google-cloud-storage

2nd to verify storage installed or not type the command:

gsutil 

(o/p will show available options)

---------------------- Copy data from GCS bucket --------

type this command: to check whether you are able to get information about bucket

gsutil acl get gs://BucketName

Now copy the file from GCS Bucket to your machine:

gsutil cp gs://BucketName/FileName /PathToDestinationDir/

In this way, you will be able to copy data from this bucket to your machine for further processing purpose.

NOTE: all the above commands can be run from a Jupyter Notebook just use ! before commands, it will run e.g.

!gsutil cp gs://BucketName/FileName /PathToDestinationDir/

Upvotes: 2

Graham Polley
Graham Polley

Reputation: 14791

If I understand your question correctly, you want to programmatically talk to GCS in Python.

The official docs are a good place to start.

First, grab the module using pip:

pip install --upgrade google-cloud-storage

Then:

from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('bucket-id-here')
# Then do other things...
blob = bucket.get_blob('remote/path/to/file.txt')
print(blob.download_as_string())
blob.upload_from_string('New contents!')
blob2 = bucket.blob('remote/path/storage.txt')
blob2.upload_from_filename(filename='/local/path.txt')

Upvotes: 7

Related Questions