Apply expanding function on dataframe

Question

I have a function that I wish to apply to a subsets of a pandas DataFrame, so that the function is calculated on all rows (until current row) from the same group - i.e. using a groupby and then expanding.

For example, this dataframe:

df = pd.DataFrame.from_dict(
    {
        'group': ['A','A','A','B','B','B'],
        'time': [1,2,3,1,2,3],
        'x1': [10,40,30,100,200,300],
        'x2': [1,0,1,2,0,3]
                  }).sort_values('time')

i.e.

    group   time    x1      x2
0   A       1       10      1
3   B       1       100     2
1   A       2       40      2
4   B       2       200     0
2   A       3       30      1
5   B       3       300     3

and this function, for example:

def foo(_df):
    return _df['x1'].max() * _df['x2'].iloc[-1]

[Edited for clarity following feedback from jezrael: my actual function is more complicated, and cannot be easily broken down into components for this task. this simple function is just for an MCVE.]

I want to do something like: df['foo_result'] = df.groupby('group').expanding().apply(foo, raw=False)

To obtain this result:

    group   time    x1  x2  foo_result
0   A       1       10  1   10
3   B       1       100 2   200
1   A       2       40  2   80
4   B       2       200 0   0
2   A       3       30  1   40
5   B       3       300 3   900

Problem is, running df.groupby('group').expanding().apply(foo, raw=False) results in KeyError: 'x1'.

Is there a correct way to run this, or is it not possible to do so in pandas without breaking down my function into components?

jezrael · Accepted Answer

An possible solution is to make the expanding part of the function, and use GroupBy.apply:

def foo1(_df):
    return _df['x1'].expanding().max() * _df['x2'].expanding().apply(lambda x: x[-1], raw=True)

df['foo_result'] = df.groupby('group').apply(foo1).reset_index(level=0, drop=True)
print (df)
  group  time   x1  x2  foo_result
0     A     1   10   1        10.0
3     B     1  100   2       200.0
1     A     2   40   2        80.0
4     B     2  200   0         0.0
2     A     3   30   1        40.0
5     B     3  300   3       900.0

This is not a direct solution to the problem of applying a dataframe function to an expanding dataframe, but it achieves the same functionality.

Apply expanding function on dataframe

Answers (2)

Related Questions