sanguineturtle
sanguineturtle

Reputation: 1455

Pandas DataFrame Object Inheritance or Object Use?

I am building a library for working with very specific structured data and I am building my infrastructure on top of Pandas. Currently I am writing a bunch of different data containers for different use cases, such as CTMatrix for Country x Time Data etc. to house methods appropriate for all CountryxTime structured data.

I am currently debating between

Option 1: Object Inheritance

class CTMatrix(pd.DataFrame):
    methods etc. here

or Option 2: Object Use

class CTMatrix(object):
    _data = pd.DataFrame

    then use getter, setter methods to control access to _data etc. 

From a software engineering perspective is there an obvious choice here?

My thoughts so far are:

Option 1:

  1. Can use DataFrame methods directly on the CTMatrix Class (like CTmatrix.sort()) without having to support them via methods on the encapsulated _data object in Option #2
  2. Updates and New methods in Pandas are inherited, except for methods that may be overwritten with local class methods

BUT

  1. Complications with some methods such as __init__() and having to pass the attributes up to the superclass super(MyDF, self).__init__(*args, **kw)

Option 2:

  1. More control over the Class and it's behavior
  2. Possibly more resilient to updates in Pandas?

But

  1. Having to use a getter() or non-hidden attribute to use the object like a dataframe such as (CTMatrix.data.sort())

Are there any additional downsides for taking the approach in Option #1?

Upvotes: 9

Views: 4805

Answers (2)

mcocdawc
mcocdawc

Reputation: 1867

Because of similar issues and Matti John's answer I wrote a _pandas_wrapper class for a project of mine, because I also wanted to inherit from pandas Dataframe.

https://github.com/mcocdawc/chemcoord/blob/bdfc186f54926ef356d0b4830959c51bb92d5583/src/chemcoord/_generic_classes/_pandas_wrapper.py

The only purpose of this class is to give a pandas DataFrame lookalike that is safe to inherit from.

If your project is LGPL licensed you can reuse it without problems.

Upvotes: 1

Matti John
Matti John

Reputation: 20477

I would avoid subclassing DataFrame, because many of the DataFrame methods will return a new DataFrame and not another instance of your CTMatrix object.

There are a few of open issues on GitHub around this e.g.:

More generally, this is a question of composition vs inheritance. I would be especially wary of benefit #2. It might seem great now, but unless you are keeping a close eye on updates to Pandas (and it is a fast moving target), you can easily end up with unexpected consequences and your code will end up intertwined with Pandas.

Upvotes: 5

Related Questions