piRSquared
piRSquared

Reputation: 294258

Some operations not respecting custom attributes in Series subclass

According to https://pandas.pydata.org/pandas-docs/stable/internals.html
I should be able to sublcass a pandas Series

My MCVE is

from pandas import Series


class Xseries(Series):
    _metadata = ['attr']

    @property
    def _constructor(self):
        return Xseries

    def __init__(self, *args, **kwargs):
        self.attr = kwargs.pop('attr', 0)
        super().__init__(*args, **kwargs)

s = Xseries([1, 2, 3], attr=3)

Notice that the attr attribute is:

s.attr

3

However, when I multiply by 2

(s * 2).attr

0

Which is the default. Therefore, the attr was not passed on. You may ask, maybe that isn't the intended behavior? I think it is according to the documentation https://pandas.pydata.org/pandas-docs/stable/internals.html#define-original-properties

And if we use the mul method, it seems to work

s.mul(2).attr

3

And this doesn't (which is the same as s * 2)

s.__mul__(2).attr

0

I wanted to put this passed SO before I created an issue on github. Is this a bug?

Is there a workaround?

I need to be able to do s * 2 and have the attr attribute passed on to the result.

Upvotes: 4

Views: 81

Answers (2)

piRSquared
piRSquared

Reputation: 294258

I will delete this answer if @chrisb posts a similar one.


As posted by @chrisb here, this is an open issue.

Matthiasha posted a workaround that is recreated below using my example from the question.

from pandas import Series


class Xseries(Series):
    _metadata = ['attr']

    @property
    def _constructor(self):
        def _c(*args, **kwargs):
            # workaround for https://github.com/pandas-dev/pandas/issues/13208
            return Xseries(*args, **kwargs).__finalize__(self)
        return _c

    def __init__(self, *args, **kwargs):
        self.attr = kwargs.pop('attr', 0)
        super().__init__(*args, **kwargs)

And now the problem is solved:

(Xseries([1, 2, 3], attr=3) * 2).attr

3

Upvotes: 0

Sraw
Sraw

Reputation: 20214

If you use inspect.getsourcelines to check the source code of these two functions mul and __mul__, you will find they actually have different implementations.

And using s.mul(2).attr still doesn't work as it just uses __finalize__ to propagate all attributes but not really multiply it.

Or maybe I am misunderstanding your question and you just want to propagate but not multiply attr as well?

If yes, you can modify your custom __mul__ function to call __finalize__.

from pandas import Series


class Xseries(Series):
    _metadata = ['attr']

    @property
    def _constructor(self):
        return Xseries

    def __init__(self, *args, **kwargs):
        self.attr = kwargs.pop('attr', 0)
        super().__init__(*args, **kwargs)

    def __mul__(self, other):
        internal_result = super().__mul__(other)
        return internal_result.__finalize__(self)

s = Xseries([1, 2, 3], attr=3)

If not, you can manually multiply attr and return.

from pandas import Series


class Xseries(Series):
    _metadata = ['attr']

    @property
    def _constructor(self):
        return Xseries

    def __init__(self, *args, **kwargs):
        self.attr = kwargs.pop('attr', 0)
        super().__init__(*args, **kwargs)

    def __mul__(self, other):
        internal_result = super().__mul__(other)
        if hasattr(other, "attr"):
            internal_result.attr = self.attr * other.attr
        else:
            internal_result.attr = self.attr * other
        return internal_result

s = Xseries([1, 2, 3], attr=3)

Upvotes: 2

Related Questions