Kevin
Kevin

Reputation: 561

Compare two columns in the same dataframe and find witch row of the first column match what row from the 2nd row

I've been trying to figure out how to compare two columns that share some values between them, but at different rows.

For example

col_index col_1 col_2
1 12 34
2 16 42
3 58 35
4 99 60
5 2 12
12 35 99

In the above example, col_1 and col_2 match on several occasions: e.g. values '12' and '99'.

I need to be able to find which rows these match at so that I can get the result of col_index.

What would be the best way to do that?

Upvotes: 0

Views: 115

Answers (3)

Carmoreno
Carmoreno

Reputation: 1319

You could use isin method to get a mask, and then use it to filter the matches. Finally, you get the col_idex column and that's all. So, using your dataframe:

mask = df.col_1.isin(df.col_2)
print(df[mask].col_index.to_list()) #to_list is only to get a python list from a Serie.

Result: [1, 4, 12]

Upvotes: 1

sophocles
sophocles

Reputation: 13831

IIUC only row 2 should be removed from col_index.

You can use np.intersect1d to find the common values between the two columns and then check if these values are in your columns using isin:

import numpy as np

common_values = np.intersect1d(df.col_1,df.col_2)
res = df[(df.col_1.isin(common_values))|(df.col_2.isin(common_values))]

res

   col_index  col_1  col_2
0          1     12     34 # 12
2          3     58     35 # 35 
3          4     99     60 # 99
4          5      2     12 # 12
5         12     35     99 # 99

res[['col_index']]

   col_index
0          1
2          3
3          4
4          5
5         12

Upvotes: 1

dermen
dermen

Reputation: 5372

Simply loop over the values that are present in both columns, using the Series.isin method

# test data:
a = 12,16,58,99
b = 34,99,35,12
c = 1,2,3,5

d = pd.DataFrame({"col_1":a, "col_2":b, 'col_idx':c})

#   col_1  col_2  col_idx
#0     12     34        1
#1     16     99        2
#2     58     35        3
#3     99     12        5


for _,row in d.loc[d.col_1.isin(d.col_2)].iterrows():
    val = row.col_1
    idx1 = row.col_idx
    
    print(val, idx1, d.query("col_2==%d" % val).col_idx.values)

#12 1 [5]
#99 5 [2]

If your values are strings (instead of integers as in this example), change the query argument accordingly: query("col_2=='%s'" % val) .

Upvotes: 0

Related Questions