Reputation: 405
I am really disappointed about how much time I have spent trying to find out how to import data from google storage into a datalab project in jupyter. I have just been using floydhub and colabratory and these are so much more straight forward. Why does colabratory and datalab have a different API for GCS!? it doesnt make sense. I am willing to pay for using GC except I assumed these services would be pretty seamless to use.
I have tsv files in a subfolder in a storage bucket and I want to import them to pandas dataframes by iterating through them. It is not clear how to do this in the documentation and this is a major oversight as it is a basic and universal operation.
Upvotes: 2
Views: 2353
Reputation: 1
An example of how you can do this:
import google.datalab.storage as st
import pandas as pd
import io
myBucket = st.Bucket('your_bucket_name')
myObject = myBucket.object('your_object_name.csv)
%%gcs read --object $myObject.uri --variable data
df = pd.read_csv (io.BytesIO(data))
Upvotes: 0
Reputation: 1426
This notebook covers how to read GCS objects into python variables: https://github.com/googledatalab/notebooks/blob/master/tutorials/Storage/Storage%20APIs.ipynb
Specifically it shows the use of this API: http://googledatalab.github.io/pydatalab/google.datalab.storage.html#google.datalab.storage.Object.read_stream
The datalab api doesn't have a method to read directly into a pandas dataframe, however. That will have to be done manually.
Upvotes: 2