Sean.H
Sean.H

Reputation: 682

Pandas, check if a column contains characters from another column, and mark out the character?

There are 2 Dataframe, df1 & df2. e.g.

df1 = pd.DataFrame({'index': [1, 2, 3, 4],
                    'col1': ['12abc12', '12abcbla', 'abc', 'jh']})

df2 = pd.DataFrame({'col2': ['abc', 'efj']})

what i want looks like this (find all the rows which contains the character from df2, and tag them out)

   index      col1      col2
0      1   12abc12       abc
1      2  12abcbla       abc
2      3       abc       abc
3      4        jh       

I've found a similar question but not exactly what i want. Thx for any ideas in advance.

Upvotes: 1

Views: 549

Answers (1)

jezrael
jezrael

Reputation: 863116

Use Series.str.extract if need first matched value:

df1['new'] = df1['col1'].str.extract(f'({"|".join(df2["col2"])})', expand=False).fillna('')
print (df1)
   index      col1  new
0      1   12abc12  abc
1      2  12abcbla  abc
2      3       abc  abc
3      4        jh     

If need all matched values use Series.str.findall and Series.str.join:

df1 = pd.DataFrame({'index': [1, 2, 3, 4],
                    'col1': ['12abc1defj2', '12abcbla', 'abc', 'jh']})

df2 = pd.DataFrame({'col2': ['abc', 'efj']})

df1['new'] = df1['col1'].str.findall("|".join(df2["col2"])).str.join(',')
print (df1)
   index         col1      new
0      1  12abc1defj2  abc,efj
1      2     12abcbla      abc
2      3          abc      abc
3      4           jh         

Upvotes: 2

Related Questions