Reputation: 193
I want to do group_by into aggregation, but for each group I want to use a function based on values from a special column which stores which function needed to be used. Easier to show on example:
id | group | val | func |
---|---|---|---|
0 | 0 | 0 | "avg" |
1 | 0 | 2 | "avg" |
2 | 0 | 2 | "avg" |
3 | 1 | 0 | "med" |
4 | 1 | 2 | "med" |
So in that example expected behaviour would be "avg" aggregation for group 0 and "median" for group 1. How can I make agg to choose function based on "func" column values? I know that I can calculate each agg function for each group and then use func as mask for choosing right values, but that isn't that great since I'd do a lot of not needed calculations, there should be a better approach...
P.S. It's guaranteed that func is the same within each group so I don't have to worry about that.
I've written my own solution for my specific case and I'll add that in question, but answer below is fine too. So, my approach was:
func_dict = {"avg": "mean", "med": "median", "min": "min","max": "max", "rnk": "first"}
def pick_price(subframe: pd.DataFrame) -> float:
func_name = subframe["agg"].iloc[0]
func_name = func_dict[func_name]
# this picks from first line in subframe a name and get real name from dict
# and next "if" block applies them among subframe
if func_name != "first":
ans = subframe["comp_price"].agg(func_name)
return 1.0 * ans
else:
idx = subframe["rank"].idxmin()
return 1.0 * subframe["comp_price"].loc[idx]
That function takes subframe with group with one same function to apply, and well, apply it. 3. Finally, use that function. First, group by groups where we need to apply different functions, and just apply with apply() method:
grouped = X.groupby("sku")
grouped.apply(pick_price)
Upvotes: 1
Views: 78
Reputation: 260875
I would use a dictionary of group: function
:
f = {0: 'mean', 1: 'median'}
df['out'] = df.groupby('group')['val'].transform(lambda s: s.agg(f.get(s.name)))
Output:
id group val out
0 0 0 0 1.333333
1 1 0 2 1.333333
2 2 0 2 1.333333
3 3 1 0 1.000000
4 4 1 2 1.000000
NB. it's a bit hacky, I prefer the dictionary. It extract the function name from the first rows of the group. The names must be valid, like mean
/meadian
, not avg
/med
.
df['out'] = (df.groupby('group')['val']
.transform(lambda s: s.agg(df.loc[s.index[0], 'func']))
)
Output:
id group val func out
0 0 0 0 mean 1.333333
1 1 0 2 mean 1.333333
2 2 0 2 mean 1.333333
3 3 1 0 median 1.000000
4 4 1 2 median 1.000000
Upvotes: 2