manoelpqueiroz
manoelpqueiroz

Reputation: 637

Store and apply a class's bound method as a variable?

In Python, a function is also an object, thus allowing one to store it as a variable:

In [1]: f = sum

In [2]: f([1, 2, 3])
Out[2]: 6

Is it possible to achieve a similar behaviour with a class's bound method?


In practical terms, I have a function where I append several pandas DataFrame operations to get it to my desired format:

import pandas as pd

def myfunc(df: pd.DataFrame): -> pd.DataFrame
    grouped = df\
        .set_index('date')\
        .groupby('class')\
        .resample('M')['value_col']\
        .count()\
        .reset_index()\
        .sort_values('date')

    return grouped

If this object functionality also extends to a class's method, independent of the instance, one could be more versatile when creating functions:

def myfunc(df: pd.DataFrame, f: str): -> pd.DataFrame
    if f == 'sum':
        FUNC = SumMethodOfResamplerGroupby
    else:
        FUNC = CountMethodOfResamplerGroupby

    grouped = df\
        .set_index('date')\
        .groupby('class')\
        .resample('M')['value_col']\
        .FUNC()\
        .reset_index()\
        .sort_values('date')

    return grouped

Compare that to how I need to implement this function currently, which seems much less pythonic:

def myfunc(df: pd.DataFrame, f: str): -> pd.DataFrame
    if f == 'sum':
        grouped = df\
            .set_index('date')\
            .groupby('class')\
            .resample('M')['value_col']\
            .sum()
    else:
        grouped = df\
            .set_index('date')\
            .groupby('class')\
            .resample('M')['value_col']\
            .count()

    return grouped.reset_index().sort_values('date')

When calling the sum and count methods in this case (without parenthesis in order to get the actual method, not its computation), Python states that they are called bound method f and bound method Resampler.count, respectively, in case it is relevant (although type of course returns method for both cases).

Upvotes: 0

Views: 157

Answers (1)

Tim Roberts
Tim Roberts

Reputation: 54726

Well, it's more work than you've implied. In order to implement the pipeline as you have it:

    grouped = df\
        .set_index('date')\
        .groupby('class')\
        .resample('M')['value_col']\
        .FUNC()\
        .reset_index()\
        .sort_values('date')

FUNC would have to a member of pandas.DataFrame, which is not what you want. You CAN add methods to an object by using what is called "monkey-patching". As in:

pd.DataFrame.SumMethodOfResamplerGroupBy = SumMethodOfResamplerGroupBy
pd.DataFrame.CountMethodOfResamplerGroupBy = CountMethodOfResamplerGroupBy

Now you could do, for example:

    grouped = df\
        .set_index('date')\
        .groupby('class')\
        .resample('M')['value_col']\
        .SumMethodOfResamplerGroupBy()\
        .reset_index()\
        .sort_values('date')

In order to call FUNC, as you have it, you would have to create a FUNC function in your object:

def myfunc(df: pd.DataFrame, f: str): -> pd.DataFrame
    if f == 'sum':
        df.FUNC = SumMethodOfResamplerGroupby
    else:
        df.FUNC = CountMethodOfResamplerGroupby

Which is maybe not as ugly as it looks.

Followup

In testing my suggestion, I see the pd.DataFrame monkey-patching works just as I expect. The df monkey-patching does not; the function does not appear to be bound; it doesn't pass the automatic self. I don't quite understand that, but I'm looking.

Upvotes: 1

Related Questions