Reputation: 7421
Something just happened with Pandas which makes me trust it a bit less, does anyone know why it behaves like this? Anyway, for this small example is easy to see, but for a larger dataframe, one would need to take care.. I almost made a mistake with something.
df = pd.DataFrame({"A":[34,12,78,84,26], "B":[54,87,35,81,87], "C":[56,78,0,14,13], "D":[0,87,72,87,14], "E":[78,12,31,0,34]})
>> df
Then, if you look for a column which isn't there:
df['b']
KeyError: 'b'
But -
df.drop_duplicates(['b', 'D'])
...runs without error, and finds the error in column D.
Actually, df.drop_duplicates(['D'])
produces exactly the same result.
It has missed one duplicate row however has also missed one in column B because it has been misspelled. It doesn't warn you or raise an error.
Using Pandas 0.22.0 and Python 3.6.4.
df.drop_duplicates(['B','D'])
just returns the original dataframe without dropping anything. Am I missing something or is Pandas broken?
Upvotes: 0
Views: 2146
Reputation: 2007
Pandas version 0.20.3 python 3.6.
When I run this line of code:
df.drop_duplicates(['b', 'D'])
There is
KeyError: 'b'
In your example is strange situation with row 4.
First
df.loc[4,'B'] = 87
After drop duplicate:
df.loc[4,'B'] = 82
It looks like you have some extra operation between this steps.
Upvotes: 1