Matthias
Matthias

Reputation: 5764

Python: Saving version number and data into pickle file

I'm serializing some data into a pickle file. Unfortunately the structure of the data might change. Therefore I have a static VERSION number in the code that is incremented if the data structure has changed. In such case the data from the pickle file is invalid and should be discarded.

Therefore I tried to save a tuple consisting of the data and a version number. But restoring it from pickle raises a UnicodeDecodeError:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)

I wonder how you would include a version number? Embedding it in the file path is an option, but much more complicated. Here's my code:

#%% Create a dataframe

import pandas as pd
values = {'Latitude': {0: 47.021503365600005,
  1: 47.021503365600005,
  2: 47.021503365600005,
  3: 47.021503365600005,
  4: 47.021503365600005,
  5: 47.021503365600005},
 'Longitude': {0: 15.481974060399999,
  1: 15.481974060399999,
  2: 15.481974060399999,
  3: 15.481974060399999,
  4: 15.481974060399999,
  5: 15.481974060399999}}

df = pd.DataFrame(values)
df.head()

#%% Save the dataframe including a version number

import pickle
VERSION = 1

file_path = 'tmp.p'
with open(file_path, 'wb') as f:
    pickle.dump((df, VERSION), f)

#%% Load the dataframe including the original verison number

try:
    with open(file_path, 'r') as f:
        df, version = pickle.load(f)
except ValueError as ex:
    print (ex)
    version = -1

#%% Compare version numbers

if version != VERSION:
    print ('Version do not match')

Upvotes: 1

Views: 1463

Answers (2)

olricson
olricson

Reputation: 311

There might be a problem with the mode you used to open the file for the read operation. For writing you use wb (write in binary mode) but for reading you use r (read not in binary mode, the b was omitted).

open(file_path, 'rb') as f

This can be an issue if you are on Windows.

See here for more details: https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files

Upvotes: 2

ezdazuzena
ezdazuzena

Reputation: 6770

If you really want to store you object using pickle, you can store a tuple in a csv file like this:

with open('my_file.csv', 'w') as fd:
    writer = csv.writer(fd)
    writer.writerow([version_number, pickle.dumps(fd)])

You will only have one file (not two, as you put in the comment), i.e. the csv file. pickle.dumps returns a string, while pickle.loads loads the object from a string, compare https://docs.python.org/3/library/pickle.html#pickle.dumps and https://docs.python.org/3/library/pickle.html#pickle.loads

Then you read the data like this

with open('my_file.csv') as fd:
    reader = csv.reader(fd)
    row = csv.readrow()
    fd_class = get_fd_class_by_version(row[0])
    fd = pickle.loads(row[1])

Here get_fd_class_by_version is a kind of factory that returns the class in dependency of the version you stored.

Upvotes: 0

Related Questions