Reputation: 9169
Have the below dataframe and would like to add a new column indicating the value associated with max from the groupby operation. So my DF would have many such records and I just want to loop through each group by similar to df.groupby('B')['C'].mean().idxmax()
but go through entire DF and then create a new col with result as below:
A B C D
0 05641 KMPV 45 056
1 05649 KMPV 45 056
2 05650 KMPV 45 056
3 05651 KMPV 45 056
4 05653 KMPV 45 056
5 05678 KMPV 45 056
6 05654 KMPV 45 056
7 05609 KMPV 45 056
8 05620 KMPV 45 056
9 05633 KMPV 45 056
10 05601 KMPV 45 056
11 05602 KMPV 45 056
12 05603 KMPV 45 056
13 05662 KMVL 23 056
14 05665 KMVL 23 056
15 05672 KMVL 23 056
16 05677 KMVL 23 056
17 05648 KMVL 23 056
18 05652 KMVL 23 056
19 05680 KMVL 23 056
20 05655 KMVL 23 056
21 05656 KMVL 23 056
22 05657 KMVL 23 056
23 05661 KMVL 23 056
24 05681 KMVL 23 056
25 05682 KMVL 23 056
26 05647 K1V4 30 056
27 05658 K1V4 30 056
New DataFrame with column indicating that KMPV has highest value.
A B C D newval
0 05641 KMPV 45 056 KMPV
1 05649 KMPV 45 056 KMPV
2 05650 KMPV 45 056 KMPV
3 05651 KMPV 45 056 KMPV
4 05653 KMPV 45 056 KMPV
5 05678 KMPV 45 056 KMPV
6 05654 KMPV 45 056 KMPV
7 05609 KMPV 45 056 KMPV
8 05620 KMPV 45 056 KMPV
9 05633 KMPV 45 056 KMPV
10 05601 KMPV 45 056 KMPV
11 05602 KMPV 45 056 KMPV
12 05603 KMPV 45 056 KMPV
13 05662 KMVL 23 056 KMPV
14 05665 KMVL 23 056 KMPV
15 05672 KMVL 23 056 KMPV
16 05677 KMVL 23 056 KMPV
17 05648 KMVL 23 056 KMPV
18 05652 KMVL 23 056 KMPV
19 05680 KMVL 23 056 KMPV
20 05655 KMVL 23 056 KMPV
21 05656 KMVL 23 056 KMPV
22 05657 KMVL 23 056 KMPV
23 05661 KMVL 23 056 KMPV
24 05681 KMVL 23 056 KMPV
25 05682 KMVL 23 056 KMPV
26 05647 K1V4 30 056 KMPV
27 05658 K1V4 30 056 KMPV
Putting it more simply. How do I select the max from the below groupby? In other words I want to filter to 056 -> KMPV and 100 -> KJRB since they have the highest numbers and so forth.
A B
056 K1V4 30
KMPV 45
KMVL 23
100 KJRB 130
KNYC 84
KTEB 80
Upvotes: 0
Views: 279
Reputation: 14179
Quick and dirty.
Data:
{'A': {0: 5641, 1: 5649, 2: 5650, 3: 5651, 4: 5653, 5: 5678, 6: 5654, 7: 5609, 8: 5620, 9: 5633, 10: 5601, 11: 5602, 12: 5603, 13: 5662, 14: 5665, 15: 5672, 16: 5677, 17: 5648, 18: 5652, 19: 5680, 20: 10155, 21: 10156, 22: 10157, 23: 10161, 24: 10181, 25: 10182, 26: 10147, 27: 10158}, 'C': {0: 45, 1: 45, 2: 45, 3: 45, 4: 45, 5: 45, 6: 45, 7: 45, 8: 45, 9: 45, 10: 45, 11: 45, 12: 45, 13: 23, 14: 23, 15: 23, 16: 23, 17: 23, 18: 23, 19: 23, 20: 23, 21: 23, 22: 23, 23: 23, 24: 23, 25: 23, 26: 30, 27: 30}, 'B': {0: 'KMPV', 1: 'KMPV', 2: 'KMPV', 3: 'KMPV', 4: 'KMPV', 5: 'KMPV', 6: 'KMPV', 7: 'KMPV', 8: 'KMPV', 9: 'KMPV', 10: 'KMPV', 11: 'KMPV', 12: 'KMPV', 13: 'KMVL', 14: 'KMVL', 15: 'KMVL', 16: 'KMVL', 17: 'KMVL', 18: 'KMVL', 19: 'KMVL', 20: 'KMVL', 21: 'KMVL', 22: 'KMVL', 23: 'KMVL', 24: 'KMVL', 25: 'KMVL', 26: 'K1V4', 27: 'K1V4'}, 'D': {0: 56, 1: 56, 2: 56, 3: 56, 4: 56, 5: 56, 6: 56, 7: 56, 8: 56, 9: 56, 10: 56, 11: 56, 12: 56, 13: 56, 14: 56, 15: 56, 16: 56, 17: 56, 18: 56, 19: 56, 20: 101, 21: 101, 22: 101, 23: 101, 24: 101, 25: 101, 26: 101, 27: 101}}
Code:
import pandas as pd
df = pd.read_csv('so.csv')
df_ = df.groupby(['D'])
# Result of above line is a tuple.
for g, v in df_:
# Get max C of each group. Then get column B.
x = v[v['C'] == v['C'].max()]['B'].iat[0]
df.loc[df['D'] == g,'newval'] = x
print df
Result:
A B C D newval
0 5641 KMPV 45 56 KMPV
...
13 5662 KMVL 23 56 KMPV
...
20 10155 KMVL 23 101 K1V4
...
26 10147 K1V4 30 101 K1V4
27 10158 K1V4 30 101 K1V4
Upvotes: 2