Pylander
Pylander

Reputation: 1591

Compare Columns Row-Wise for Partial String Match

My question is similar to this: How to check whether the content of Column A is contained in Column B using Python DataFrame?

Unfortunately, the chosen answer results in a nonetype error in my case.

I have a pandas dataframe in the following format:

id,text_1,text_2_compare
1,yyy,yy
2,yxy,xx
3,zzy,zy
4,zzy,x
5,xyx,yx

I would like to compare the columns to see if "text_2_compare" is contained in "text_1" and create an new indicator.

id,text_1,text_2_compare,match
1,yyy,yy,1
2,yxy,xx,0
3,zzy,zy,1
4,zzy,x,0
5,xyx,yx,1

Any tips or tricks (particularly a vectorized implementation) would be most appreciated!

Upvotes: 3

Views: 548

Answers (3)

Onyambu
Onyambu

Reputation: 79328

import re

df['compare_match']=df.apply(lambda v:len(re.findall(v[2],v[1])),axis=1)

df
   id text_1 text_2_compare  compare_match
0   1    yyy             yy              1
1   2    yxy             xx              0
2   3    zzy             zy              1
3   4    zzy              x              0
4   5    xyx             yx              1

EDIT:

I actually thought OP needed the number of times text_2_compared appeared in text_1, but on reading the question again, it seems OP just wants an indicator variable. Thus using v[2] in v[1] as done above by @gaganso is sufficient

Upvotes: 1

gaganso
gaganso

Reputation: 3011

Building on @Onyambu's answer.

in can be used in place of re.findall()

df["match"] = df.apply(lambda v: int(v[2] in v[1]),axis=1)
print(df["match"]

Output:

0    1
1    0
2    1
3    0
4    1

Upvotes: 1

BENY
BENY

Reputation: 323366

Using simple list

df['New']=[int(y in x) for x , y in zip(df['text_1'],df['text_2_compare'])]
df
Out[496]: 
   id text_1 text_2_compare  New
0   1    yyy             yy    1
1   2    yxy             xx    0
2   3    zzy             zy    1
3   4    zzy              x    0
4   5    xyx             yx    1

Upvotes: 0

Related Questions