Read csv from GCP Bucket, then save it back as pickle file

Question

I'm reading a csv file from a certain bucket at Google Cloud Platform.

After reading it and instantiating at my jupyter notebook, I want to save it back to the same bucket, but as a pickle file. To do that, I'm trying:

new_blob = blob.name.replace(("." + file_type), '') + '_v1'
df.to_pickle(f"gs://my_bucket/{new_blob}.pkl")

As you can see, I get the original name from the blob, and the df is already the dataframe (file_type is the original extension, like 'csv'). It works If I save It into my localhost.

However, when I run It, no error is raised. It runs as if worked - but when I check the bucket, I don't find the pretended file. Any ideas?

Piotr · Accepted Answer

You need to save the pickled file locally first and then upload it using gs:

from google.cloud import storage

client = storage.Client()

new_blob = blob.name.replace(("." + file_type), "") + "_v1"
fname = f"{new_blob}.pkl"
df.to_pickle(fname)
new_blob = new_bucket.blob("/my_bucket/" + fname)
new_blob.upload_from_filename(filename=fname)

Alternatively, after saving the file locally, you may execute

$ gsutil cp file_name.csv gs://bucket_name/file_name.pkl

This is also possible straight from jupyter, by putting the command in one of the notebook cells and preceding it with an exclamation mark.

Read csv from GCP Bucket, then save it back as pickle file

Answers (1)

Related Questions