GRS
GRS

Reputation: 3084

Reading a pandas pickle file in Tensorflow in CloudML

I'm getting an error trying to read a pandas pickle e.g. df.to_pickle() method, which is stored in Google Cloud storage. I'm trying to do the following:

path_to_gcs_file = 'gs://xxxxx'
f = file_io.FileIO(path_to_gcs_file, mode='r').read()
train_df = pd.read_pickle(f)
f.close()

I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Alternatively I tried:

f = BytesIO(file_io.read_file_to_string(path_to_gcs_file, binary_mode=True))
train_df = pd.read_pickle(f)

Which works locally but not on CloudML!

f = file_io.read_file_to_string(path_to_gcs_file, binary_mode=True)
train_df = pd.read_pickle(f)

Gives me an error: AttributeError: 'bytes' object has no attribute 'seek'

Upvotes: 0

Views: 949

Answers (2)

rhaertel80
rhaertel80

Reputation: 8389

pandas.read_pickle accepts a path as the first argument; you are passing a File object (file.FileIO) and a bytes object (read_to_string).

So far I have not found a way to read a pickle object directly from GCS using pandas, so you will have to copy it to the machine. You could use file_io.copy for that:

file_io.copy('gs://xxxx', '/tmp/x.pkl')
train_df = pd.read_pickle('/tmp/x.pkl')

Upvotes: 1

krflol
krflol

Reputation: 1155

You should be able to get away with using a context manager, but I think you're pulling the end of the certificate using this way, so you should instead download the file through the api

pip install --upgrade google-cloud-storage

Then

# Initialise a client
storage_client = storage.Client("[Your project name here]")
# Create a bucket object for our bucket
bucket = storage_client.get_bucket(bucket_name)
# Create a blob object from the filepath
blob = bucket.blob("folder_one/foldertwo/filename.extension")
# Download the file to a destination
blob.download_to_filename(path_to_gcs_file)
with open(path_to_gcs_file, "rb" as f:
    train_df = = pickle.load(f)

Much was taken from this answer: Downloading a file from google cloud storage inside a folder

Upvotes: 0

Related Questions