Reputation: 191
I have a dataframe df
that look like this.
df = pd.DataFrame(
{
"id": [101, 102, 102, 103, 201, 202, 202, 203],
"type": ["A1", "B2", "B3", "A3", "A1", "B2", "B3", "A1"],
"Val": np.random.randn(8),
}
)
id type value
0 101 A1 -0.238221
1 102 B2 1.051656
2 102 B3 -1.335417
3 102 A3 0.859323
4 201 A1 -0.819574
5 202 B2 -0.589278
6 202 B3 0.925917
7 203 A1 -0.044021
First, I applied the following aggregation to get the certain type
of each id
idx = df.groupby("id")["type"].transform(lambda x: x.str.contains("B"))
df[idx]
id type value
1 102 B2 1.051656
2 102 B3 -1.335417
5 202 B2 -0.589278
6 202 B3 0.925917
Now, here is my question. I need to drop all records for any id
where type
value is not the maximum. By maximum, I mean there is this precedence:
An > Ak where n > k. For example, for id
102, B3 > B2, so the row with B2 type for the id
102 should be dropped. Please notice that for any id
, initial dataframe does not contain any duplicate type
column. That is, there is no such situation where we have two B2 for id 102. For example:
id type value
1 102 B2 1.051656
2 102 B2 -1.335417
So, we don't need to consider equality situation on type
column for any id
.
What I've tried is this; which gives me the maximum values as I wanted.
df[idx].groupby("id").max()
type value
id
102 B3 1.051656
202 B3 0.925917
However, I need to modify my original dataframe df
, so that any other record other than the ones above has been dropped. And I need to do this operation for any type
A,B,...Z In other words, I need df
to look like this in the end.
id type value
0 101 A1 -0.238221
2 102 B3 -1.335417
3 102 A3 0.859323
4 201 A1 -0.819574
6 202 B3 0.925917
7 203 A1 -0.044021
EDIT: Fringe case
There is one exceptional type
, BA. And the precedence for that type is determined as follows:
There are limited possibilities for BA type
value: zBA or yBA
Precedence here is zBA > yBA
For example
id type value
0 102 zBA -0.238221
1 102 yBA 1.051656
Output
id type value
0 102 zBA -0.238221
Upvotes: 2
Views: 62
Reputation: 28729
You can pull the letters and numbers into individual columns, groupby on the letters to get the maximum numbers, filter on the numbers and get your final dataframe:
(df.assign(letter = df['type'].str[0],
number = df['type'].str[-1],
filt = lambda df: df.number.eq(df.groupby(['id', 'letter'])
.number.transform('max')))
.loc[lambda df: df.filt, df.columns])
id type value
0 101 A1 -0.238221
2 102 B3 -1.335417
3 102 A3 0.859323
4 201 A1 -0.819574
6 202 B3 0.925917
7 203 A1 -0.044021
Upvotes: 2