Reputation: 2953
I have installed Pandas 17.0. I am now getting a strange error
ValueError: keep must be either "first", "last" or False
when I attempt this:
ids=ids.drop_duplicates('ID')
This always worked in previous Pandas versions, the code has not changed. BTW ids
is a dataframe containing a column of integers...
Here is the traceback:
Traceback (most recent call last):
File "<ipython-input-34-6e98a890591b>", line 1, in <module>
ids=ids.drop_duplicates('ID')
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
1164, in drop_duplicates
return super(Series, self).drop_duplicates(keep=keep, inplace=inplace)
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 576,
in drop_duplicates
duplicated = self.duplicated(keep=keep)
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
1169, in duplicated
return super(Series, self).duplicated(keep=keep)
File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
line 89, in wrapper
return func(*args, **kwargs)
File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 603,
in duplicated
duplicated = lib.duplicated(keys, keep=keep)
File "pandas\lib.pyx", line 1383, in pandas.lib.duplicated
(pandas\lib.c:24490)
ValueError: keep must be either "first", "last" or False
Note the keep=keep
? The default in Pandas 17.0 for drop_duplicates
is keep='first'
. So if I don't specify shouldn't it default to that? And why would I get an error here? Bug in Pandas 17.0?
Upvotes: 3
Views: 7209
Reputation: 46351
I tried the syntax (using keep
), previously was take_last
...
import pandas as pd
df = pd.DataFrame({'c1': ['cat'] * 3 + ['dog'] * 4,
'c2': [1, 1, 2, 3, 3, 4, 4]})
print(df)
print(df.drop_duplicates())
print(df.drop_duplicates(['c1', 'c2'],keep='first'))
print(df.drop_duplicates(['c1', 'c2'],keep='last'))
print(df.drop_duplicates(['c1', 'c2'],keep=False)) #drops all but one cat stays
By default for drop_duplicates()
it's keep='first'
and all columns accounted.
Upvotes: 1
Reputation: 394091
The error indicates that ids
is in fact a Series
for which the first param is the keep
param, if ids
really is a df then this error would not happen as drop_duplicates
first param is subset
.
Upvotes: 4