zodiac645
zodiac645

Reputation: 191

Get the indices of the maximum value of each group

I have a dataframe df that look like this.

df = pd.DataFrame(
    {
        "id": [101, 102, 102, 103, 201, 202, 202, 203],
        "type": ["A1", "B2", "B3", "A3", "A1", "B2", "B3", "A1"],
        "Val": np.random.randn(8),
    }
)

     id  type       value
0   101   A1    -0.238221
1   102   B2     1.051656
2   102   B3    -1.335417
3   102   A3     0.859323
4   201   A1    -0.819574
5   202   B2    -0.589278
6   202   B3     0.925917
7   203   A1    -0.044021

First, I applied the following aggregation to get the certain type of each id

    idx = df.groupby("id")["type"].transform(lambda x: x.str.contains("B"))

    df[idx]
    
    id    type      value
1   102   B2     1.051656
2   102   B3    -1.335417
5   202   B2    -0.589278
6   202   B3     0.925917

Now, here is my question. I need to drop all records for any id where type value is not the maximum. By maximum, I mean there is this precedence:

An > Ak where n > k. For example, for id 102, B3 > B2, so the row with B2 type for the id 102 should be dropped. Please notice that for any id, initial dataframe does not contain any duplicate type column. That is, there is no such situation where we have two B2 for id 102. For example:

    id    type      value
1   102   B2     1.051656
2   102   B2    -1.335417

So, we don't need to consider equality situation on type column for any id.

What I've tried is this; which gives me the maximum values as I wanted.

df[idx].groupby("id").max()
      type  value
id      
102   B3    1.051656
202   B3    0.925917

However, I need to modify my original dataframe df, so that any other record other than the ones above has been dropped. And I need to do this operation for any type A,B,...Z In other words, I need df to look like this in the end.

     id  type       value
0   101   A1    -0.238221
2   102   B3    -1.335417
3   102   A3     0.859323
4   201   A1    -0.819574
6   202   B3     0.925917
7   203   A1    -0.044021

EDIT: Fringe case

There is one exceptional type, BA. And the precedence for that type is determined as follows:

There are limited possibilities for BA type value: zBA or yBA

Precedence here is zBA > yBA

For example

     id  type       value
0   102   zBA    -0.238221
1   102   yBA     1.051656

Output

     id  type       value
0   102   zBA    -0.238221

Upvotes: 2

Views: 62

Answers (1)

sammywemmy
sammywemmy

Reputation: 28729

You can pull the letters and numbers into individual columns, groupby on the letters to get the maximum numbers, filter on the numbers and get your final dataframe:

 (df.assign(letter = df['type'].str[0], 
            number = df['type'].str[-1], 
            filt = lambda df: df.number.eq(df.groupby(['id', 'letter'])
                                            .number.transform('max')))
    .loc[lambda df: df.filt, df.columns])

    id type     value
0  101   A1 -0.238221
2  102   B3 -1.335417
3  102   A3  0.859323
4  201   A1 -0.819574
6  202   B3  0.925917
7  203   A1 -0.044021

Upvotes: 2

Related Questions