Reputation: 637
In Python, a function is also an object, thus allowing one to store it as a variable:
In [1]: f = sum
In [2]: f([1, 2, 3])
Out[2]: 6
Is it possible to achieve a similar behaviour with a class's bound method?
In practical terms, I have a function where I append several pandas DataFrame operations to get it to my desired format:
import pandas as pd
def myfunc(df: pd.DataFrame): -> pd.DataFrame
grouped = df\
.set_index('date')\
.groupby('class')\
.resample('M')['value_col']\
.count()\
.reset_index()\
.sort_values('date')
return grouped
If this object functionality also extends to a class's method, independent of the instance, one could be more versatile when creating functions:
def myfunc(df: pd.DataFrame, f: str): -> pd.DataFrame
if f == 'sum':
FUNC = SumMethodOfResamplerGroupby
else:
FUNC = CountMethodOfResamplerGroupby
grouped = df\
.set_index('date')\
.groupby('class')\
.resample('M')['value_col']\
.FUNC()\
.reset_index()\
.sort_values('date')
return grouped
Compare that to how I need to implement this function currently, which seems much less pythonic:
def myfunc(df: pd.DataFrame, f: str): -> pd.DataFrame
if f == 'sum':
grouped = df\
.set_index('date')\
.groupby('class')\
.resample('M')['value_col']\
.sum()
else:
grouped = df\
.set_index('date')\
.groupby('class')\
.resample('M')['value_col']\
.count()
return grouped.reset_index().sort_values('date')
When calling the sum
and count
methods in this case (without parenthesis in order to get the actual method, not its computation), Python states that they are called bound method f
and bound method Resampler.count
, respectively, in case it is relevant (although type
of course returns method
for both cases).
Upvotes: 0
Views: 157
Reputation: 54726
Well, it's more work than you've implied. In order to implement the pipeline as you have it:
grouped = df\
.set_index('date')\
.groupby('class')\
.resample('M')['value_col']\
.FUNC()\
.reset_index()\
.sort_values('date')
FUNC
would have to a member of pandas.DataFrame
, which is not what you want. You CAN add methods to an object by using what is called "monkey-patching". As in:
pd.DataFrame.SumMethodOfResamplerGroupBy = SumMethodOfResamplerGroupBy
pd.DataFrame.CountMethodOfResamplerGroupBy = CountMethodOfResamplerGroupBy
Now you could do, for example:
grouped = df\
.set_index('date')\
.groupby('class')\
.resample('M')['value_col']\
.SumMethodOfResamplerGroupBy()\
.reset_index()\
.sort_values('date')
In order to call FUNC, as you have it, you would have to create a FUNC function in your object:
def myfunc(df: pd.DataFrame, f: str): -> pd.DataFrame
if f == 'sum':
df.FUNC = SumMethodOfResamplerGroupby
else:
df.FUNC = CountMethodOfResamplerGroupby
Which is maybe not as ugly as it looks.
Followup
In testing my suggestion, I see the pd.DataFrame
monkey-patching works just as I expect. The df
monkey-patching does not; the function does not appear to be bound; it doesn't pass the automatic self
. I don't quite understand that, but I'm looking.
Upvotes: 1