shortorian
shortorian

Reputation: 1182

How do I avoid subclassing a pandas DataFrame using composition?

The pandas documentation recommends against sub-classing their data structures. One of their recommended alternatives is to use composition, but they just point readers to a Wikipedia article on composition vs. inheritance. That article and other resources I've found have not helped me understand how to extend a pandas DataFrame using composition. Can someone explain composition in this context and tell me about cases where composition might be a preferred alternative to sub-classing pd.DataFrame? A simple example or a link to information that's more instructive than Wikipedia articles would be very helpful.

In this question I'm specifically asking how composition should be used in cases where someone might be tempted to subclass pd.DataFrame. I understand there are other solutions to extending a Python object that do not involve composition, and I asked another question about extending pandas DataFrames that resulted in a different solution using a wrapper class.


I didn't understand that "wrapping" and "composition" refer to the same approach here, as noted in MaxYarmolinsky's answer below. The answer to the question I linked to above has a more complete discussion about using composition in this case, which may require handling __getattr__, __getitem__, and __setitem__ properly (I realize this is obvious to people who know what they're doing, but I had to ask my previous question because I had failed to get/set items when I tried on my own).

Upvotes: 1

Views: 576

Answers (2)

Greg
Greg

Reputation: 63

In OOP inheriting models an "is-a" relationship where composition models "has-a."

In general you should reach for composition over inheritance unless you have a specific polymorphic design in mind as it is less tightly coupled and more modular. Inheritance is the strongest coupling you can do. And strong coupling leads to maintenance difficulties (everything is connected and hard to separate), whereas composition is much easier to refactor.

Inheritance can also lead to confusing inheritance hierarchies if care is not taken with the design or design is incremental.

That said don't be afraid to use inheritance for polymorphism. But be wary of using it for simple code reuse.

Upvotes: 2

MaxYarmolinsky
MaxYarmolinsky

Reputation: 1139

Just some googling show you how to create a simple class as you describe through composition.

  class mydataframe():
      def __init__(self,data):
          self.coredataframe = pd.DataFrame(data)
          self.otherattribute = None

Then you can add methods and attributes of your own...

Upvotes: 2

Related Questions