John Ladasky
John Ladasky

Reputation: 1064

Pickling Pandas DataFrames subclasses which include metadata

The question about attaching metadata to Pandas objects, and getting that data to survive a pickle/unpickle process is a perennial one. I see some very old answers, which basically say that you can't. Hopefully, a more current answer to this question will be yes. I'm using Pandas 0.23.3.

I've made some Pandas DataFrame subclasses. I think I know how to do this correctly. I have a _constructor method, and my __init__ method can handle BlockManager objects. When I create meta-data attributes, I suppress the UserWarning which cautions that I'm not creating a column in the DataFrame itself, which in my case is fine.

When I want to save the DataFrame to disk, I call my_fancy_df.to_pickle(file_path). When I want to reload it, I use my_fancy_df = pandas.read_pickle(file_path). MY meta-data gets removed. Pandas itself has meta-data which pickles and unpickles fine, such as the DataFrame.name attribute. I would like to copy this behavior for my attributes.

I could intercept the .to_pickle call in my subclass, and arrange to write the meta-data separately into the same file object. But I don't see an equivalent approach for changing the way that data is reloaded. The read_pickle function is general-purpose, and lives in the Pandas namespace, it doesn't belong to the DataFrame class.

I could possibly write a custom unpickling function, external to my class and use that... it seems clumsy. If there's an elegant way to get this job done, I haven't found it.

I'm also not dead-set on using pickle. If HDF5 is more suitable, for example, I can switch. I do need to pickle arbitrary Python data types in the DataFrame, though. The content in the cells is not just strings and numbers, I have tuples as well, and in one subclass I've built I even placed DataFrames inside DataFrames.

Thanks for your advice.

Upvotes: 2

Views: 921

Answers (1)

John Ladasky
John Ladasky

Reputation: 1064

The comment from user "root" was helpful. I have confirmed that if you define a class property called _metadata inside your custom DataFrame subclass, it is the list of the instance properties you want to retain through slicing, pickling, and unpickling operations.

Upvotes: 1

Related Questions