jusjosgra
jusjosgra

Reputation: 405

Importing data from storage bucket to datalab

I am really disappointed about how much time I have spent trying to find out how to import data from google storage into a datalab project in jupyter. I have just been using floydhub and colabratory and these are so much more straight forward. Why does colabratory and datalab have a different API for GCS!? it doesnt make sense. I am willing to pay for using GC except I assumed these services would be pretty seamless to use.

I have tsv files in a subfolder in a storage bucket and I want to import them to pandas dataframes by iterating through them. It is not clear how to do this in the documentation and this is a major oversight as it is a basic and universal operation.

Upvotes: 2

Views: 2353

Answers (2)

An example of how you can do this:

import google.datalab.storage as st
import pandas as pd
import io

myBucket = st.Bucket('your_bucket_name')
myObject = myBucket.object('your_object_name.csv)

%%gcs read --object $myObject.uri --variable data
df = pd.read_csv (io.BytesIO(data))

Upvotes: 0

Chris Meyers
Chris Meyers

Reputation: 1426

This notebook covers how to read GCS objects into python variables: https://github.com/googledatalab/notebooks/blob/master/tutorials/Storage/Storage%20APIs.ipynb

Specifically it shows the use of this API: http://googledatalab.github.io/pydatalab/google.datalab.storage.html#google.datalab.storage.Object.read_stream

The datalab api doesn't have a method to read directly into a pandas dataframe, however. That will have to be done manually.

Upvotes: 2

Related Questions