Pandas drop duplicate rows INCLUDING index

Question

I know how to drop duplicate rows based on column data. I also know how to drop dublicate rows based on row index. My question is: is there a way to drop duplicate rows based on index and one column?

Thanks!

JJ101 · Accepted Answer

This can be done by turning the index into a column.

Below is a sample data set (fyi, I think someone downvoted your question because it didn't include a sample data set):

df=pd.DataFrame({'a':[1,2,2,3,4,4,5], 'b':[2,2,2,3,4,5,5]}, index=[0,1,1,2,3,5,5])

Output:

Then you can use the following line. The first reset_index() makes a new column with the index numbers. Then you can drop duplicates based on the new index column and the other column (b in this case). Afterward, you can set the index to the original index values with set_index('index'):

df.reset_index().drop_duplicates(subset=['index','b']).set_index('index')

Ouput:

       a  b
index      
0      1  2
1      2  2
2      3  3
3      4  4
5      4  5

Pandas drop duplicate rows INCLUDING index

Answers (2)

Related Questions