bellotto
bellotto

Reputation: 519

Read csv from GCP Bucket, then save it back as pickle file

I'm reading a csv file from a certain bucket at Google Cloud Platform.

After reading it and instantiating at my jupyter notebook, I want to save it back to the same bucket, but as a pickle file. To do that, I'm trying:

new_blob = blob.name.replace(("." + file_type), '') + '_v1'
df.to_pickle(f"gs://my_bucket/{new_blob}.pkl")

As you can see, I get the original name from the blob, and the df is already the dataframe (file_type is the original extension, like 'csv'). It works If I save It into my localhost.

However, when I run It, no error is raised. It runs as if worked - but when I check the bucket, I don't find the pretended file. Any ideas?

Upvotes: 0

Views: 1045

Answers (1)

Piotr
Piotr

Reputation: 712

You need to save the pickled file locally first and then upload it using gs:

from google.cloud import storage

client = storage.Client()

new_blob = blob.name.replace(("." + file_type), "") + "_v1"
fname = f"{new_blob}.pkl"
df.to_pickle(fname)
new_blob = new_bucket.blob("/my_bucket/" + fname)
new_blob.upload_from_filename(filename=fname)

Alternatively, after saving the file locally, you may execute

$ gsutil cp file_name.csv gs://bucket_name/file_name.pkl

This is also possible straight from jupyter, by putting the command in one of the notebook cells and preceding it with an exclamation mark.

Upvotes: 2

Related Questions