Alejo Paullier
Alejo Paullier

Reputation: 51

Reading and writing pickles using Google Cloud

I want to read an existing pickle (that contains a dictionary) which is stored in a folder inside a Google Cloud Bucket. Then update the pickle after performing some functions, which is equal to overwriting the pickle.

Traditionally I would do something like:

import pickle
# Read pickle:
pickle_in = open('dictionary.pickle','rb')
my_dictionary = pickle.load(pickle_in)
my_dictionary 

# MODIFY DICTIONARY BY, FOR EXAMPLE, ADDING NEW REGISTERS

# Overwrite pickle:
pickle_out = open('dictionary.pickle','wb') 
pickle.dump(my_modified_dictionary,pickle_out)
pickle_out.close()

Now I need to do something similar but on Google Cloud. So I need to change the path of the file and use cloudstorage.open():

import pickle
my_path = '/bucket_name/pickle_folder/my_dictionary.pickle'

# Read pickle:
pickle_in = cloudstorage.open(path,'r')
my_dictionary = pickle.load(pickle_in)
my_dictionary 

# MODIFY DICTIONARY BY, FOR EXAMPLE, ADDING NEW REGISTERS

# Overwrite pickle:
pickle_out = cloudstorage.open(path,'w') 
pickle.dump(my_modified_dictionary,pickle_out)
pickle_out.close()

Will this work? cloudstorage.open() seems to be the equivalent to open(). But I am not sure that if I specify the path when dumping the pickle will actually overwrite the pickle on the specified folder.

Upvotes: 4

Views: 15081

Answers (2)

Paul
Paul

Reputation: 2347

Here's a bit more pythonic solution that will also automatically close the filestreams for you:

from google.cloud import storage
import pickle

storage_client = storage.Client()
bucket = storage_client.bucket('your-gcs-bucket')
blob = bucket.blob('dictionary.pickle')

d = {'a': 1, 'b':2}

# write
with blob.open(mode='wb') as f:
   pickle.dump(d, f)

# read
with blob.open(mode='rb') as f:
   d_reloaded = pickle.load(f)

print(d_reloaded)

Upvotes: 2

David
David

Reputation: 9721

The basic idea of doing read-modify-write from GCS is possible. You should be aware that this will not work well with concurrency - if a second process does a read before the the first writes back, then when the second process writes back it will lose the first process's changes. The best solution to this is to use a database rather than pickling to GCS.

In addition, be aware that pickle is not secure, and you should not load pickles you didn't write.

If you do still want to use GCS for this you should use the standard GCS client library, something like:

from google.cloud import storage

storage_client = storage.Client()

bucket = storage_client.bucket('your-gcs-bucket')
blob = bucket.blob('dictionary.pickle')
pickle_in = blob.download_as_string()
my_dictionary = pickle.loads(pickle_in)

# MODIFY DICTIONARY BY, FOR EXAMPLE, ADDING NEW REGISTERS

pickle_out = pickle.dumps(my_modified_dictionary)
blob.upload_from_string(pickle_out)

Upvotes: 12

Related Questions