Reputation: 123
For the subset argument i want to specify the first n-1 columns. How'll I do that?
For example: in the following dataset
0 1 2 3 4 5 6
0 0 12 1 99 23 2 75
1 0 12 1 99 23 2 66
2 5 12 1 99 23 2 66
I want the result to be 1st and 3 rd row only:
0 1 2 3 4 5 6
0 0 12 1 99 23 2 75
1 5 12 1 99 23 2 66
If I do something like the following I get error:
df.drop_duplicates(subset=[0:df.shape[1]-1],keep='first',inplace=True)
Upvotes: 2
Views: 2637
Reputation: 402353
You're close, but you can index on the column names, it's easier.
df.drop_duplicates(subset=df.columns[:-1], keep='first')
0 1 2 3 4 5 6
0 0 12 1 99 23 2 75
2 5 12 1 99 23 2 66
Where,
df.columns[1:].tolist()
['0', '1', '2', '3', '4', '5']
This generalises to any dataFrame regardless of what its column names are.
Upvotes: 2
Reputation: 323226
You can using duplicated
df[~df.iloc[:,:-1].duplicated()]
Out[53]:
0 1 2 3 4 5 6
0 0 12 1 99 23 2 75
2 5 12 1 99 23 2 66
Upvotes: 3