Reputation: 7021
I'm trying to use .loc
to index into a DataFrame
with 5,272 rows and 524 columns. As far as I can tell, each row is supposed to have a unique label (the sid
).
sid_year_data
is a dictionary whose keys are two-digit years, and whose rows are sid
s.
[ (year, len(sids)) for year, sids in sid_year_data.items() ]
>>> [(17, 844), (18, 1299), (19, 1453), (20, 1616)]
frame_data_17 = frame_data.loc[sid_year_data[17]]
frame_data_17.shape
>>> (851, 524)
How is it possible for indexing with 844 keys to return 851 rows? Pandas indexes are not allowed to contain duplicates, are they?
Upvotes: 1
Views: 606
Reputation: 637
The .loc method returns more rows than in the initial data frame when the argument of the .loc method has duplicates and the data frame contains rows with identical indexes but distinct values.
Upvotes: 0
Reputation: 7021
Pandas indexes are allowed to contain duplicates, and in this case frame_data
's index had 70 duplicate items.
Upvotes: 1