Reputation: 17164
I have a dataframe like this:
df = pd.DataFrame({'A' : list('ababababba'),
'B' : [1, 1, 1, 2, 2, 1,1,2,1,1],
'C' : [2.0, 5., 8., 1., 2., 9.,2.0,4.0,5.0,3.0],
'D' : [10,20,30,10,20,30,20,40,50,10]})
Required:
A B C D
0 a 1 2.0 10 # a1 min keep
1 b 1 5.0 20 # b1 min
2 a 1 8.0 30 # a1 max keep
3 b 2 1.0 10
4 a 2 2.0 20
# b1 removed
# a1 remove
7 b 2 4.0 40
8 b 1 5.0 50 # b1 max keep
9 a 1 3.0 10 # a1 min keep
Related links: Min and max row from pandas groupby
Max and min from two series in pandas groupby
Max and Min date in pandas groupby
pandas groupby and then select a row by value of column (min,max, for example)
Upvotes: 5
Views: 3339
Reputation: 150825
Do you want this:
df.groupby(['A','B']).D.agg([min,max])
Output:
+---+---+-----+-----+
| | | min | max |
+---+---+-----+-----+
| A | B | | |
+---+---+-----+-----+
| a | 1 | 10 | 30 |
| | 2 | 20 | 20 |
| b | 1 | 20 | 50 |
| | 2 | 10 | 40 |
+---+---+-----+-----+
Edit: If you want all rows with either min or max, then consider transform
groups = df.groupby(['A','B']).D
min_val = groups.transform(min)
max_val = groups.transform(max)
df[(df.D==min_val) | (df.D==max_val)]
Output:
+---+---+---+-----+----+
| | A | B | C | D |
+---+---+---+-----+----+
| 0 | a | 1 | 2.0 | 10 |
| 1 | b | 1 | 5.0 | 20 |
| 2 | a | 1 | 8.0 | 30 |
| 3 | b | 2 | 1.0 | 10 |
| 4 | a | 2 | 2.0 | 20 |
| 7 | b | 2 | 4.0 | 40 |
| 8 | b | 1 | 5.0 | 50 |
+---+---+---+-----+----+
Upvotes: 5