Reputation: 5764
I'm serializing some data into a pickle file. Unfortunately the structure of the data might change. Therefore I have a static VERSION number in the code that is incremented if the data structure has changed. In such case the data from the pickle file is invalid and should be discarded.
Therefore I tried to save a tuple consisting of the data and a version number. But restoring it from pickle raises a UnicodeDecodeError:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128)
I wonder how you would include a version number? Embedding it in the file path is an option, but much more complicated. Here's my code:
#%% Create a dataframe
import pandas as pd
values = {'Latitude': {0: 47.021503365600005,
1: 47.021503365600005,
2: 47.021503365600005,
3: 47.021503365600005,
4: 47.021503365600005,
5: 47.021503365600005},
'Longitude': {0: 15.481974060399999,
1: 15.481974060399999,
2: 15.481974060399999,
3: 15.481974060399999,
4: 15.481974060399999,
5: 15.481974060399999}}
df = pd.DataFrame(values)
df.head()
#%% Save the dataframe including a version number
import pickle
VERSION = 1
file_path = 'tmp.p'
with open(file_path, 'wb') as f:
pickle.dump((df, VERSION), f)
#%% Load the dataframe including the original verison number
try:
with open(file_path, 'r') as f:
df, version = pickle.load(f)
except ValueError as ex:
print (ex)
version = -1
#%% Compare version numbers
if version != VERSION:
print ('Version do not match')
Upvotes: 1
Views: 1463
Reputation: 311
There might be a problem with the mode you used to open the file for the read operation.
For writing you use wb
(write in binary mode) but for reading you use r
(read not in binary mode, the b
was omitted).
open(file_path, 'rb') as f
This can be an issue if you are on Windows.
See here for more details: https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files
Upvotes: 2
Reputation: 6770
If you really want to store you object using pickle, you can store a tuple in a csv file like this:
with open('my_file.csv', 'w') as fd:
writer = csv.writer(fd)
writer.writerow([version_number, pickle.dumps(fd)])
You will only have one file (not two, as you put in the comment), i.e. the csv file. pickle.dumps
returns a string, while pickle.loads
loads the object from a string, compare https://docs.python.org/3/library/pickle.html#pickle.dumps and https://docs.python.org/3/library/pickle.html#pickle.loads
Then you read the data like this
with open('my_file.csv') as fd:
reader = csv.reader(fd)
row = csv.readrow()
fd_class = get_fd_class_by_version(row[0])
fd = pickle.loads(row[1])
Here get_fd_class_by_version
is a kind of factory that returns the class in dependency of the version you stored.
Upvotes: 0