Reputation: 1591
My question is similar to this: How to check whether the content of Column A is contained in Column B using Python DataFrame?
Unfortunately, the chosen answer results in a nonetype error in my case.
I have a pandas dataframe in the following format:
id,text_1,text_2_compare
1,yyy,yy
2,yxy,xx
3,zzy,zy
4,zzy,x
5,xyx,yx
I would like to compare the columns to see if "text_2_compare" is contained in "text_1" and create an new indicator.
id,text_1,text_2_compare,match
1,yyy,yy,1
2,yxy,xx,0
3,zzy,zy,1
4,zzy,x,0
5,xyx,yx,1
Any tips or tricks (particularly a vectorized implementation) would be most appreciated!
Upvotes: 3
Views: 548
Reputation: 79328
import re
df['compare_match']=df.apply(lambda v:len(re.findall(v[2],v[1])),axis=1)
df
id text_1 text_2_compare compare_match
0 1 yyy yy 1
1 2 yxy xx 0
2 3 zzy zy 1
3 4 zzy x 0
4 5 xyx yx 1
EDIT:
I actually thought OP needed the number of times text_2_compared
appeared in text_1
, but on reading the question again, it seems OP just wants an indicator variable. Thus using v[2] in v[1]
as done above by @gaganso is sufficient
Upvotes: 1
Reputation: 3011
Building on @Onyambu's answer.
in
can be used in place of re.findall()
df["match"] = df.apply(lambda v: int(v[2] in v[1]),axis=1)
print(df["match"]
Output:
0 1
1 0
2 1
3 0
4 1
Upvotes: 1
Reputation: 323366
Using simple list
df['New']=[int(y in x) for x , y in zip(df['text_1'],df['text_2_compare'])]
df
Out[496]:
id text_1 text_2_compare New
0 1 yyy yy 1
1 2 yxy xx 0
2 3 zzy zy 1
3 4 zzy x 0
4 5 xyx yx 1
Upvotes: 0