dataframe drop_duplicates with subset of columns

Question

For the subset argument i want to specify the first n-1 columns. How'll I do that?

For example: in the following dataset

   0   1  2   3   4  5   6
0  0  12  1  99  23  2  75
1  0  12  1  99  23  2  66
2  5  12  1  99  23  2  66

I want the result to be 1st and 3 rd row only:

   0   1  2   3   4  5   6
0  0  12  1  99  23  2  75
1  5  12  1  99  23  2  66

If I do something like the following I get error:

df.drop_duplicates(subset=[0:df.shape[1]-1],keep='first',inplace=True)

BENY · Accepted Answer

You can using duplicated

df[~df.iloc[:,:-1].duplicated()]
Out[53]: 
   0   1  2   3   4  5   6
0  0  12  1  99  23  2  75
2  5  12  1  99  23  2  66

Answers (2)