Pandas: keep first row of duplicated indices of second level of multi index

Question

I found lots of drop_duplicates for index when both multi level indices are the same but, I would like to keep the first row of a multi index when the second level of the multi index has duplicates. So here:

|                                |   col_0 |   col_1 |   col_2 |   col_3 |   col_4 |
|:-------------------------------|--------:|--------:|--------:|--------:|--------:|
|  date         |      ID        
| ('2022-01-01', 'identifier_0') |      26 |      46 |      44 |      21 |      10 |
| ('2022-01-01', 'identifier_1') |      25 |      45 |      83 |      23 |      45 |
| ('2022-01-01', 'identifier_2') |      42 |      79 |      55 |       5 |      78 |
| ('2022-01-01', 'identifier_3') |      32 |       4 |      57 |      19 |      61 |
| ('2022-01-01', 'identifier_4') |      30 |      25 |       5 |      93 |      72 |
| ('2022-01-02', 'identifier_0') |      42 |      14 |      56 |      43 |      42 |
| ('2022-01-02', 'identifier_1') |      90 |      27 |      46 |      58 |       5 |
| ('2022-01-02', 'identifier_2') |      33 |      39 |      53 |      94 |      86 |
| ('2022-01-02', 'identifier_3') |      32 |      65 |      98 |      81 |      64 |
| ('2022-01-02', 'identifier_4') |      48 |      31 |      25 |      58 |      15 |
| ('2022-01-03', 'identifier_0') |       5 |      80 |      33 |      96 |      80 |
| ('2022-01-03', 'identifier_1') |      15 |      86 |      45 |      39 |      62 |
| ('2022-01-03', 'identifier_2') |      98 |       3 |      42 |      50 |      83 |

I'd like to keep first rows with unique ID.

Corralien · Accepted Answer

If your index is a MultiIndex:

>>> df.loc[~df.index.get_level_values('ID').duplicated()]

                         col_0  col_1  col_2  col_3  col_4
date       ID                                             
2022-01-01 identifier_0     26     46     44     21     10
           identifier_1     25     45     83     23     45
           identifier_2     42     79     55      5     78
           identifier_3     32      4     57     19     61
           identifier_4     30     25      5     93     72

# Or
>>> df.groupby(level='ID').first()

              col_0  col_1  col_2  col_3  col_4
ID                                             
identifier_0     26     46     44     21     10
identifier_1     25     45     83     23     45
identifier_2     42     79     55      5     78
identifier_3     32      4     57     19     61
identifier_4     30     25      5     93     72

If your index is an Index:

>>> df.loc[~df.index.str[1].duplicated()]

                            col_0  col_1  col_2  col_3  col_4
(2022-01-01, identifier_0)     26     46     44     21     10
(2022-01-01, identifier_1)     25     45     83     23     45
(2022-01-01, identifier_2)     42     79     55      5     78
(2022-01-01, identifier_3)     32      4     57     19     61
(2022-01-01, identifier_4)     30     25      5     93     72

>>> df.groupby(df.index.str[1]).first()
              col_0  col_1  col_2  col_3  col_4
identifier_0     26     46     44     21     10
identifier_1     25     45     83     23     45
identifier_2     42     79     55      5     78
identifier_3     32      4     57     19     61
identifier_4     30     25      5     93     72

Pandas: keep first row of duplicated indices of second level of multi index

Answers (1)

Related Questions