Reputation: 1490
I have such a data frame df
:
a b
10 2
3 1
0 0
0 4
....
# about 50,000+ rows
I wish to choose the df[:5, 'a']
. But When I call df.loc[:5, 'a']
, I got an error: KeyError: 'Cannot get right slice bound for non-unique label: 5
. When I call df.loc[5]
, the result contains 250 rows while there is just one when I use df.iloc[5]
. Why does this thing happen and how can I index it properly? Thank you in advance!
Upvotes: 8
Views: 22768
Reputation: 101
To filter with non-unique indexs try something like this: df.loc[(df.index>0)&(df.index<2)]
Upvotes: 10
Reputation: 17
The issue with the way you are addressing is that, there are multiple rows with index as 5. So the loc attribute does not know which one to pick. To know just do a df.loc[5] you will get number of rows with same index. Either you can sort it using sort_index or you can first aggregate data based on index and then retrieve. Hope this helps.
Upvotes: 0
Reputation: 42905
The error message is explained here: if the index is not monotonic, then both slice bounds must be unique members of the index
.
The difference between .loc
and .iloc
is label
vs integer position
based indexing - see docs. .loc
is intended to select individual labels
or slices
of labels. That's why .loc[5]
selects all rows where the index
has the value 250 (and the error is about a non-unique index). iloc
, in contrast, select row number 5 (0-indexed). That's why you only get a single row, and the index value may or may not be 5
. Hope this helps!
Upvotes: 8