Is there an efficient way to match two string columns in pandas?

Question

I want to match two string columns and count number of (exact) matches. For example, there are two columns like

index   col1   col2
0       aa     ji
1       bs     aa
2       qe     bs
3       gd     aa

col1 consists of unique ids. I want to count how many times each element of col1 occurs in col2. In other words, I would like to get an output like:

col3
2
1
0
0

in above example.

I have tried above work using pandas str.contains() and for loop, but given a large number of observations, it seems too slow and inefficient. My code looks like below.

num = []
for i in range(len(col1)):
    count = col2.str.contains(col1[i]).sum()
    num_replies.append(count)

Is there a time-efficient way to do this work?

Scott Boston · Accepted Answer

Use map and value_count:

df['col3'] = df['col1'].map(df['col2'].value_counts()).fillna(0)

Output:

   index col1 col2  col3
0      0   aa   ji   2.0
1      1   bs   aa   1.0
2      2   qe   bs   0.0
3      3   gd   aa   0.0

Is there an efficient way to match two string columns in pandas?

Answers (2)

Related Questions