Reputation: 131088
Let us assume that we have a GroupBy
object that was obtained as a result of groupby
operation applied to a DataFrame
:
grouped = data_frame.groupy(['col_1', 'col_2'])
We can generate a new data frame if we specify how values in the GroupBy object should be combined to get single values. For example:
grouped.agg('col_3':sum, 'col_4':min, 'col_5':user_defined_function)
In the above example we used functions that take lists (or, more precisely, series) as input and return a single value as an output. This is nice but what I need is to use two series as an input. For example, I want to take values from col_3
and col_4
and use them to generate a single values.
For example I might want to find out what is the maximal absolute difference between the corresponding values in col_3
and col_4
.
Is there a way to do that in pandas?
Upvotes: 2
Views: 122
Reputation: 64443
If you dont specify a function per column, all columns will be passed to the function (for both apply and agg). So:
data_frame.groupy(['col_1', 'col_2']).apply(lambda x: np.max(np.abs(x['col_3'] - x['col_4'])))
That gives the absolute maximum difference between col_3 and col_4 for each group.
Upvotes: 3