user13132640
user13132640

Reputation: 349

How to implement a custom equality comparison which can test pd.DataFrame attributes?

I have a custom dataclass, which is rather lage (many attributes, methods). Some attributes are pandas dataframes. The default __eq__ comparison does not work for the attribtues which are pandas dataframes. Hence, I started trying to write a custom __eq__ function to handle this. I came up with this, which seems to work:

    def __eq__(self, other):
        attribs = [a for a in dir(self) if (not a.startswith('_'))&(callable(self.__getattribute__(a))==False)]
        for a in attribs:
            if isinstance(self.__getattribute__(a),pd.DataFrame):
                same = self.__getattribute__(a).equals(other.__getattribute__(a))
            else:
                same = (self.__getattribute__(a)==other.__getattribute__(a))
            if not same:
                break
        return same

I'm testing by creating a class instance, saving to pickle and then reading that pickle file to a new variable.

However, it left me with two questions:

  1. I had to add the limitation that no callables are compared, since when I compare the callables I get "false" even though the instances are the same (the error was not clear enough for me to understand why I get "False"). How does the default dataclass __eq__ actually handle this, are callables also ignored?

  2. I assume there is a faster, vectorized way to check the attributes without having to use this for loop approach, but I don't see exactly how it would work. Any thoughts?

Upvotes: 0

Views: 28

Answers (0)

Related Questions