Comparing two dataframe and output the index of the duplicated row once

Question

I need help with comparing two dataframes. For example:

The first dataframe is

df_1 = 
    0   1   2   3   4   5
0   1   1   1   1   1   1
1   2   2   2   2   2   2
2   3   3   3   3   3   3
3   4   4   4   4   4   4
4   2   2   2   2   2   2
5   5   5   5   5   5   5
6   1   1   1   1   1   1
7   6   6   6   6   6   6

The second dataframe is

df_2 = 
    0   1   2   3   4   5
0   1   1   1   1   1   1
1   2   2   2   2   2   2
2   3   3   3   3   3   3
3   4   4   4   4   4   4
4   5   5   5   5   5   5
5   6   6   6   6   6   6

May I know if there is a way (without using for loop) to find the index of the rows of df_1 that have the same row values of df_2. In the example above, my expected output is below

index = 
0
1
2
3
5
7

The size of the column of the "index" variable above should have the same column size of df_2.

If the same row of df_2 repeated in df_1 more than once, I only need the index of the first appearance, thats why I don't need the index 4 and 6.

Please help. Thank you so much!

Tommy

jezrael · Accepted Answer

Use DataFrame.merge with DataFrame.drop_duplicates and DataFrame.reset_index for convert index to column for avoid lost index values, last select column called index:

s = df_2.merge(df_1.drop_duplicates().reset_index())['index']
print (s)
0    0
1    1
2    2
3    3
4    5
5    7
Name: index, dtype: int64

Detail:

print (df_2.merge(df_1.drop_duplicates().reset_index()))
   0  1  2  3  4  5  index
0  1  1  1  1  1  1      0
1  2  2  2  2  2  2      1
2  3  3  3  3  3  3      2
3  4  4  4  4  4  4      3
4  5  5  5  5  5  5      5
5  6  6  6  6  6  6      7

Comparing two dataframe and output the index of the duplicated row once

Answers (2)

Related Questions