laptou
laptou

Reputation: 7021

.loc returns more rows than keys

I'm trying to use .loc to index into a DataFrame with 5,272 rows and 524 columns. As far as I can tell, each row is supposed to have a unique label (the sid).

sid_year_data is a dictionary whose keys are two-digit years, and whose rows are sids.

[ (year, len(sids)) for year, sids in sid_year_data.items() ]

>>> [(17, 844), (18, 1299), (19, 1453), (20, 1616)]

frame_data_17 = frame_data.loc[sid_year_data[17]]

frame_data_17.shape

>>> (851, 524)

How is it possible for indexing with 844 keys to return 851 rows? Pandas indexes are not allowed to contain duplicates, are they?

Upvotes: 1

Views: 606

Answers (2)

DmitriBolt
DmitriBolt

Reputation: 637

The .loc method returns more rows than in the initial data frame when the argument of the .loc method has duplicates and the data frame contains rows with identical indexes but distinct values.

Upvotes: 0

laptou
laptou

Reputation: 7021

Pandas indexes are allowed to contain duplicates, and in this case frame_data's index had 70 duplicate items.

Upvotes: 1

Related Questions