Reputation: 2564
I work with python 3.9.6 and pandas 1.3.0.
My colleague works with python 3.6.12 and pandas 1.1.5.
I want to create a dataframe and share it with my colleague, without asking them to update their environment (that request would incur some hassle).
How can I write out a dataframe to a file using my newer python/pandas versions in a way that their older python/pandas versions can read it in as a dataframe?
Default .to_pickle()
method
If in the newer python environment I write:
df.to_pickle(r"C:\somepath\file.bz2")
and in the older python environment I try:
df.read_pickle(r"C:\somepath\file.bz2")
I get:
ValueError: unsupported pickle protocol: 5
Specifying a protocol version in the .to_pickle()
method
Fine, I thought, I'll specify a different protocol.
df.to_pickle(r"C:\somepath\file.bz2", protocol=3)
However, if in the older python environment I try to load it I get
AttributeError: module 'pandas.core.internals.blocks' has no attribute 'new_block'
This error remains for all protocol versions from 0 to 5.
Previous question on protocol version
I found this question, which only has the answer that the pandas versions must match.
I find it hard to believe that's the only solution, as then, what's the point of having multiple pickle protocols which are meant to be backward compatible?
Previous question on the new_block
attribute
This question mentions the same error with the missing new_block
attribute. Again, the answer is to update the pandas version (over which I have no control at the moment).
Downgrading the newer python/pandas versions
I could downgrade my newer python/pandas to match my colleague's versions.
Haven't tried it yet, but I assume that should work. However, that would really be a last resort, as then I would need a special "low version" environment to work with this one colleague.
Exporting to CSV
This works, but it loses some dataframe specific features like data types and NaN
values, so I don't consider this a valid workaround.
Pickling separately
I thought maybe the issue lies in the pandas .to_pickle()
or .read_pickle()
method, so I tried using the pickle
library directly to write the file (using protocol 3):
import pickle
with open('file.pkl', 'wb') as f:
pickle.dump(df, f, 3)
... and then read it in the older python environment:
import pickle
with open('file.pkl', 'rb') as f:
df = pickle.load(f)
Unfortunately, I am still met with
AttributeError: module 'pandas.core.internals.blocks' has no attribute 'new_block'
Converting to a dict, then pickling that
Per the suggestion in the comments I tried:
ddf = df.to_dict()
with open('file.pkl', 'wb') as f:
pickle.dump(ddf, f, 3)
But then, when I try to read it in the older environment, I get:
AttributeError: Can't get attribute '_unpickle_timestamp' on <module 'pandas._libs.tslibs.timestamps
My DataFrame has a timestamp column in it, which apparently cannot be unpickled by the older pandas version.
Upvotes: 5
Views: 2194
Reputation: 24261
what's the point of having multiple pickle protocols which are meant to be backward compatible?
These protocols are designed for different ways different versions of Python read pickled files. They do not convert objects built with a recent version of a specific library into a version compatible with an older version.
it's only a minor semantic versioning difference and that should yield "functionality in a backwards compatible manner".
You're misunderstanding this, it means that code written with the earlier version will still function as expected using the new version. It does not mean new functions introduced in the recent version will work in the older one (or objects created using them).
If this is a shared project with multiple collaborators your development should be in a virtual environment where you can load the exact requisites of the project (in terms of both python version and library/version requirements), so as to not run into conflicts with your global python install.
Both you and your colleague can work from within your venv's with full confidence that you are using compatible libraries and functionality.
It is very straightforward to set up, effectively you just create a new folder with its own python install, and then any libraries installed from within the venv are stored there. This local version of python only sees these libraries. This is what a requirements.txt
file for a project does - it defines the libraries and version requirements of the project.
When you are done with it you can easily delete the folder.
Steps:
/my/env
:
python -m venv /my/env --upgrade-deps
pip install -r requirements.txt
You can easily create a requirements.txt
file like so:
pip install pipreqs
pipreqs /path/to/project
You could manually change the datatype of the column to one recognised by the earlier version of pandas (e.g. freetext) before pickling it.
Upvotes: 1