Ryan
Ryan

Reputation: 640

What are the possible causes of this df.drop() behavior in Pandas?

I have a Pandas dataframe df of which df2 is a subset. When I try to drop rows in df based on the index values of df2, I get some funny math as below. What might be causing such behavior? Am I completely misunderstanding how .index works?

print(df.index)
    
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 
    353, 354, 355, 356, 357, 358, 359, 360, 361, 362], dtype='int64', length=4748)
print(df2.index)
    
Int64Index([ 0, 2, 5, 7, 9, 10, 12, 15, 17, 18, ... 
    106, 123, 130, 136, 196, 217, 220, 227, 232, 237], dtype='int64', length=448)
df = df.drop(index = df2.index)
print(df.index)

Int64Index([ 63, 65, 67, 74, 76, 78, 83, 84, 85, 87, ... 
    352, 353, 354, 355, 356, 357, 358, 359, 360, 361], dtype='int64', length=2116)

Upvotes: 0

Views: 30

Answers (1)

Yehuda
Yehuda

Reputation: 1893

Based on the numbering system, it looks like there are multiple records with the same index. If that's the case, dropping, for example, 106 because it is in df2 may result in multiple records being dropped from df. Check on the duplicates in your dataframes, at least in the indices.

Upvotes: 2

Related Questions