sranga
sranga

Reputation: 123

dataframe drop_duplicates with subset of columns

For the subset argument i want to specify the first n-1 columns. How'll I do that?

For example: in the following dataset

   0   1  2   3   4  5   6
0  0  12  1  99  23  2  75
1  0  12  1  99  23  2  66
2  5  12  1  99  23  2  66

I want the result to be 1st and 3 rd row only:

   0   1  2   3   4  5   6
0  0  12  1  99  23  2  75
1  5  12  1  99  23  2  66

If I do something like the following I get error:

df.drop_duplicates(subset=[0:df.shape[1]-1],keep='first',inplace=True)

Upvotes: 2

Views: 2637

Answers (2)

cs95
cs95

Reputation: 402353

You're close, but you can index on the column names, it's easier.

df.drop_duplicates(subset=df.columns[:-1], keep='first')

   0   1  2   3   4  5   6
0  0  12  1  99  23  2  75
2  5  12  1  99  23  2  66

Where,

df.columns[1:].tolist()
['0', '1', '2', '3', '4', '5']

This generalises to any dataFrame regardless of what its column names are.

Upvotes: 2

BENY
BENY

Reputation: 323226

You can using duplicated

df[~df.iloc[:,:-1].duplicated()]
Out[53]: 
   0   1  2   3   4  5   6
0  0  12  1  99  23  2  75
2  5  12  1  99  23  2  66

Upvotes: 3

Related Questions