Yippee
Yippee

Reputation: 337

Comparing two dataframe and output the index of the duplicated row once

I need help with comparing two dataframes. For example:

The first dataframe is

df_1 = 
    0   1   2   3   4   5
0   1   1   1   1   1   1
1   2   2   2   2   2   2
2   3   3   3   3   3   3
3   4   4   4   4   4   4
4   2   2   2   2   2   2
5   5   5   5   5   5   5
6   1   1   1   1   1   1
7   6   6   6   6   6   6

The second dataframe is

df_2 = 
    0   1   2   3   4   5
0   1   1   1   1   1   1
1   2   2   2   2   2   2
2   3   3   3   3   3   3
3   4   4   4   4   4   4
4   5   5   5   5   5   5
5   6   6   6   6   6   6

May I know if there is a way (without using for loop) to find the index of the rows of df_1 that have the same row values of df_2. In the example above, my expected output is below

index = 
0
1
2
3
5
7

The size of the column of the "index" variable above should have the same column size of df_2.

If the same row of df_2 repeated in df_1 more than once, I only need the index of the first appearance, thats why I don't need the index 4 and 6.

Please help. Thank you so much!

Tommy

Upvotes: 1

Views: 34

Answers (2)

jezrael
jezrael

Reputation: 862406

Use DataFrame.merge with DataFrame.drop_duplicates and DataFrame.reset_index for convert index to column for avoid lost index values, last select column called index:

s = df_2.merge(df_1.drop_duplicates().reset_index())['index']
print (s)
0    0
1    1
2    2
3    3
4    5
5    7
Name: index, dtype: int64

Detail:

print (df_2.merge(df_1.drop_duplicates().reset_index()))
   0  1  2  3  4  5  index
0  1  1  1  1  1  1      0
1  2  2  2  2  2  2      1
2  3  3  3  3  3  3      2
3  4  4  4  4  4  4      3
4  5  5  5  5  5  5      5
5  6  6  6  6  6  6      7

Upvotes: 1

vrana95
vrana95

Reputation: 521

Check the solution

df1=pd.DataFrame({'0':[1,2,3,4,2,5,1,6],
                 '1':[1,2,3,4,2,5,1,6],
                '2':[1,2,3,4,2,5,1,6],
                 '3':[1,2,3,4,2,5,1,6],
                 '4':[1,2,3,4,2,5,1,6],
                '5':[1,2,3,4,2,5,1,6]})

df1=pd.DataFrame({'0':[1,2,3,4,5,6],
                 '1':[1,2,3,4,5,66],
                '2':[1,2,3,4,5,6],
                 '3':[1,2,3,4,5,66],
                 '4':[1,2,3,4,5,6],
                '5':[1,2,3,4,5,6]})
df1[df1.isin(df2)].index.values.tolist()

### Output
[0, 1, 2, 3, 4, 5, 6, 7]

Upvotes: 0

Related Questions