Gaurav Bansal
Gaurav Bansal

Reputation: 5670

Not able to groupby apply function with two arguments in Python

My question is related to this one. I have a Pandas DataFrame as shown below. I want to calculate MAPE after grouping by period. However, I'm getting an error when trying to do so. What am I doing wrong?

# Create DataFrame
df = pd.DataFrame({
    'date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-02'],
    'period': [1, 2, 1, 2, 3],
    'actuals': [50, 43, 42, 51, 49],
    'forecast': [49, 48, 50, 39, 51]
})

# Define MAPE
def mape(act, fct):
    return np.sum(abs((act - fct)/act))/len(act)

# Try to calculate MAPE for each period (this fails)
df.groupby('period').apply(mape, act='actuals', fct='forecast')
TypeError: mape() got multiple values for argument 'act'

Upvotes: 1

Views: 243

Answers (3)

ALollz
ALollz

Reputation: 59579

Another alternative is to avoid the slow groupby + apply all together in favor of vectorized operations that act on the entire DataFrame and built-in DataFrame.GroupBy.mean which is implemented in cython.

Perform the calculation then you want the mean of that Series (within period).

(df['actuals'] - df['forecast']).div(df['actuals']).abs().groupby(df['period']).mean()

period
1    0.105238
2    0.175787
3    0.040816
dtype: float64

To clean up a little but, define a function to calculate the absolute percent error Series and take the mean of that.

def ape(act: pd.Series, fct: pd.Series):
    return (act - fct).div(act).abs()

ape(df['actuals'], df['forecast']).groupby(df['period']).mean()

Upvotes: 1

SeaBean
SeaBean

Reputation: 23237

You can keep your definition of mape() function unchanged by changing the call as follows:

df.groupby('period').apply(lambda x: mape(x['actuals'], x['forecast']))

Your way of passing parameters requires changing the function definition as pointed out by the other answer. This is because the function need to have access of the DataFrame object in addition to the column names for it to access the column values.

Calling with lambda function in this way, the function receives the respective values in the parameters already and don't need the DataFrame name.

Calling in this way has the advantage that the function doesn't need to be customized for pandas environment and can be shared with other general Python programming logics.

Upvotes: 3

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

Change the function to:

def mape(data, act, fct):
    act = data[act]
    fct = data[fct]
    return np.sum(abs((act - fct)/act))/len(act)

While using groupby.apply, the data of the group is passed to the function as first argument.

Upvotes: 4

Related Questions