Reputation: 1064
The question about attaching metadata to Pandas objects, and getting that data to survive a pickle/unpickle process is a perennial one. I see some very old answers, which basically say that you can't. Hopefully, a more current answer to this question will be yes. I'm using Pandas 0.23.3.
I've made some Pandas DataFrame subclasses. I think I know how to do this correctly. I have a _constructor
method, and my __init__
method can handle BlockManager
objects. When I create meta-data attributes, I suppress the UserWarning which cautions that I'm not creating a column in the DataFrame itself, which in my case is fine.
When I want to save the DataFrame to disk, I call my_fancy_df.to_pickle(file_path)
. When I want to reload it, I use my_fancy_df = pandas.read_pickle(file_path)
. MY meta-data gets removed. Pandas itself has meta-data which pickles and unpickles fine, such as the DataFrame.name
attribute. I would like to copy this behavior for my attributes.
I could intercept the .to_pickle
call in my subclass, and arrange to write the meta-data separately into the same file object. But I don't see an equivalent approach for changing the way that data is reloaded. The read_pickle function is general-purpose, and lives in the Pandas namespace, it doesn't belong to the DataFrame class.
I could possibly write a custom unpickling function, external to my class and use that... it seems clumsy. If there's an elegant way to get this job done, I haven't found it.
I'm also not dead-set on using pickle. If HDF5 is more suitable, for example, I can switch. I do need to pickle arbitrary Python data types in the DataFrame, though. The content in the cells is not just strings and numbers, I have tuples as well, and in one subclass I've built I even placed DataFrames inside DataFrames.
Thanks for your advice.
Upvotes: 2
Views: 921
Reputation: 1064
The comment from user "root" was helpful. I have confirmed that if you define a class property called _metadata inside your custom DataFrame subclass, it is the list of the instance properties you want to retain through slicing, pickling, and unpickling operations.
Upvotes: 1