Reputation: 435
I have a Dataframe like this:
id1 id2 c1 c2
1 a1 5 3
1 a1 6 2
1 a1 9 4
2 b1 5 2
2 b1 23 43
1 a1 34 32
3 3a 234 435
Now I want to add eight new columns like this:
c1_max c2_max c1_secondMax c2_secondMax c1_thirdMax c2_thirdMax c1_min c2_min
I want to groupBy id1
and id2
I already added the c1_max c2_max c1_min c2_min
columns like this:
features = ["c1","c2"]
maxData = all_data.groupby(['id1','id2'])[features].agg('max')
all_data = pd.merge(all_data, maxData.reset_index(), suffixes=["", "_max"], how='left', on=['id1', 'id2'])
minData= all_data.groupby(['id1','id2'])[features].agg('min')
all_data = pd.merge(all_data, minData.reset_index(), suffixes=["", "_max"], how='left', on=['id1', 'id2'])
But, how do I add _secondMax
and _thirdMax
. Basically, they are the second max and third max of those rows.
Upvotes: 0
Views: 49
Reputation: 153460
Let's try this:
smax = lambda x: x[x.rank(ascending = False) == 2]
smax.__name__ = 'secondMax'
tmax = lambda x: x[x.rank(ascending = False) == 3]
tmax.__name__ = 'thirdMax'
df_out = df.groupby(['id1','id2']).agg(['max','min',smax, tmax])
df_out.columns = [f'{i}_{j}' for i,j in df_out.columns]
df_out = df.merge(df_out, on=['id1','id2'])
df_out = df_out.where(df_out.astype(bool))
df_out
Output:
id1 id2 c1 c2 c1_max c1_min c1_secondMax c1_thirdMax c2_max c2_min c2_secondMax c2_thirdMax
0 1 a1 5 3 34 5 9 6 32 2 4 3
1 1 a1 6 2 34 5 9 6 32 2 4 3
2 1 a1 9 4 34 5 9 6 32 2 4 3
3 1 a1 34 32 34 5 9 6 32 2 4 3
4 2 b1 5 2 23 5 5 NaN 43 2 2 NaN
5 2 b1 23 43 23 5 5 NaN 43 2 2 NaN
6 3 3a 234 435 234 234 NaN NaN 435 435 NaN NaN
Upvotes: 1
Reputation: 88236
In the same way you have obtained maxdata
and mindata
, you can get the nth
minimums and maximums doing:
df.sort_values(['id1','id2','c1']).groupby(['id1','id2']).c1.nth(1)
id1 id2
1 a1 6
2 b1 23
Name: c1, dtype: int64
So in this case you would be obtaining the second minimum for c1
Where df.sort_values(['id1','id2','c1'])
will sort the values according id1
, id2
and c1
respectively, which is equivalent to grouping by id1
and id2
and sorting c1
:
id1 id2 c1 c2
0 1 a1 5 3
1 1 a1 6 2
2 1 a1 9 4
5 1 a1 34 32
3 2 b1 5 2
4 2 b1 23 43
6 3 3a 234 435
Upvotes: 1