Reputation: 640
I have a Pandas dataframe df
of which df2
is a subset. When I try to drop rows in df
based on the index values of df2
, I get some funny math as below. What might be causing such behavior? Am I completely misunderstanding how .index
works?
print(df.index)
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ...
353, 354, 355, 356, 357, 358, 359, 360, 361, 362], dtype='int64', length=4748)
print(df2.index)
Int64Index([ 0, 2, 5, 7, 9, 10, 12, 15, 17, 18, ...
106, 123, 130, 136, 196, 217, 220, 227, 232, 237], dtype='int64', length=448)
df = df.drop(index = df2.index)
print(df.index)
Int64Index([ 63, 65, 67, 74, 76, 78, 83, 84, 85, 87, ...
352, 353, 354, 355, 356, 357, 358, 359, 360, 361], dtype='int64', length=2116)
Upvotes: 0
Views: 30
Reputation: 1893
Based on the numbering system, it looks like there are multiple records with the same index. If that's the case, dropping, for example, 106
because it is in df2
may result in multiple records being dropped from df
. Check on the duplicates in your dataframes, at least in the indices.
Upvotes: 2