Python's Pickle differences between OS X and Linux (over Pandas DataFrame object)

Question

I exported a DataFrame on OS X to pickle using to_pickle.

Loading it back on OS X (using read_pickle) returns the same DataFrame as expected, but loading it on a Linux system (Debian) using the same function returns a different content.

From several posts it seems that pickle is guaranteed to be cross-platform when using binary mode (see: Is pickle file of python cross-platform?), but to_pickle and read_pickle don't accept any arguments, and I couldn't tell from their documentation if it's binary by default.

How can I know if they are?

How can I make sure that my pickle files will be identical across platforms?

Notes:

This is a part of the .pickle file created using to_pickle:

945d 948c 055f 6461 7461 948c 1570 616e
6461 732e 636f 7265 2e69 6e74 6572 6e61
6c73 948c 0c42 6c6f 636b 4d61 6e61 6765
7294 9394 297d 9492 9428 5d94 288c 1370

Exporting it with a prefix of b (df.to_pickle(b'pickle_folder/df.pickle' as opposed to df.to_pickle('pickle_folder/df.pickle') doesn't change it's content.

Both python versions are identical (3.4.4).

EDIT

From their source code it seems like their using the highest protocol and binary reading/writing. That answers my first question. Still looking for a reason why they are different between platforms.

MaxU - stand with Ukraine · Accepted Answer

I can't directly answer your question:

why they are different between platforms?

But as a workaround you can use a standard HDF5 format, which will work on all platforms and has nice features:

ability to read only that part of your data that satisfies conditions, using where='where clause' argument (those columns must be indexed - check data_columns argument). So you may have huge amount of data in the HDF5 files and you can process it in chunks, efficiently reading (using indexes) chunks into memory. I.e. you don't need to read all the data from disk in order to filter it.
ability to compress data (for example using very fast and pretty efficient compression algorithm: blosc)

Storing and reading to/from HDF5 files can be very fast depending on a used dtypes. NOTE: working with strings (dtype: object) can be much slower comparing to Pickle format.

Another standard option is to use a central database which should be available for all platforms and give you a possibility to (pre-)filter and sort your data on the DB server side.

Python's Pickle differences between OS X and Linux (over Pandas DataFrame object)

Answers (1)

Related Questions

Python&#39;s Pickle differences between OS X and Linux (over Pandas DataFrame object)

Answers (1)

Related Questions

Python's Pickle differences between OS X and Linux (over Pandas DataFrame object)