Reputation: 161
I found lots of drop_duplicates for index when both multi level indices are the same but, I would like to keep the first row of a multi index when the second level of the multi index has duplicates. So here:
| | col_0 | col_1 | col_2 | col_3 | col_4 |
|:-------------------------------|--------:|--------:|--------:|--------:|--------:|
| date | ID
| ('2022-01-01', 'identifier_0') | 26 | 46 | 44 | 21 | 10 |
| ('2022-01-01', 'identifier_1') | 25 | 45 | 83 | 23 | 45 |
| ('2022-01-01', 'identifier_2') | 42 | 79 | 55 | 5 | 78 |
| ('2022-01-01', 'identifier_3') | 32 | 4 | 57 | 19 | 61 |
| ('2022-01-01', 'identifier_4') | 30 | 25 | 5 | 93 | 72 |
| ('2022-01-02', 'identifier_0') | 42 | 14 | 56 | 43 | 42 |
| ('2022-01-02', 'identifier_1') | 90 | 27 | 46 | 58 | 5 |
| ('2022-01-02', 'identifier_2') | 33 | 39 | 53 | 94 | 86 |
| ('2022-01-02', 'identifier_3') | 32 | 65 | 98 | 81 | 64 |
| ('2022-01-02', 'identifier_4') | 48 | 31 | 25 | 58 | 15 |
| ('2022-01-03', 'identifier_0') | 5 | 80 | 33 | 96 | 80 |
| ('2022-01-03', 'identifier_1') | 15 | 86 | 45 | 39 | 62 |
| ('2022-01-03', 'identifier_2') | 98 | 3 | 42 | 50 | 83 |
I'd like to keep first rows with unique ID.
Upvotes: 1
Views: 51
Reputation: 120429
If your index is a MultiIndex
:
>>> df.loc[~df.index.get_level_values('ID').duplicated()]
col_0 col_1 col_2 col_3 col_4
date ID
2022-01-01 identifier_0 26 46 44 21 10
identifier_1 25 45 83 23 45
identifier_2 42 79 55 5 78
identifier_3 32 4 57 19 61
identifier_4 30 25 5 93 72
# Or
>>> df.groupby(level='ID').first()
col_0 col_1 col_2 col_3 col_4
ID
identifier_0 26 46 44 21 10
identifier_1 25 45 83 23 45
identifier_2 42 79 55 5 78
identifier_3 32 4 57 19 61
identifier_4 30 25 5 93 72
If your index is an Index
:
>>> df.loc[~df.index.str[1].duplicated()]
col_0 col_1 col_2 col_3 col_4
(2022-01-01, identifier_0) 26 46 44 21 10
(2022-01-01, identifier_1) 25 45 83 23 45
(2022-01-01, identifier_2) 42 79 55 5 78
(2022-01-01, identifier_3) 32 4 57 19 61
(2022-01-01, identifier_4) 30 25 5 93 72
>>> df.groupby(df.index.str[1]).first()
col_0 col_1 col_2 col_3 col_4
identifier_0 26 46 44 21 10
identifier_1 25 45 83 23 45
identifier_2 42 79 55 5 78
identifier_3 32 4 57 19 61
identifier_4 30 25 5 93 72
Upvotes: 1