stjepan
stjepan

Reputation: 35

Pandas, finding match(any) between list of strings and df column values(as list) to create new column?

I have a list of strings. I need to iterate through rows of my dataframe to try to find if any or more of list items are included in value of one column(string). I'm trying to find substring match between any list item and dataframe column value. Then, I need to assign matched value(s) to a new column or pass NaN if there's no match. Not just any, but all matched parts of string. So, in the third row of my df, these would be both 'E' and 'F22'.

df = pd.DataFrame({'type':['A23 E I28','I28 F A23', 'D41 E F22']})
matches = ['E', 'F22']

Upvotes: 2

Views: 3770

Answers (2)

Hryhorii Pavlenko
Hryhorii Pavlenko

Reputation: 3910

Is this what you're looking for?

If there's a match, the keyword is assigned to a new colum

df['new_col'] = df['type'].str.extract(f"({'|'.join(matches)})")
    type        new_col
0   A23 E I28   E
1   I28 F A23   NaN
2   D41 E F22   E

Edit:

df['new_col'] = (df['type']
                 .str.findall(f"({'|'.join(matches)})")
                 .str.join(', ')
                 .replace('', np.nan))
    type    new_col
0   A23 E I28   E
1   I28 F A23   NaN
2   D41 E F22   E, F22

Upvotes: 2

ivallesp
ivallesp

Reputation: 2222

I would do it this way:

df["match"] = df.type.map(lambda s: "".join(set(s).intersection(matches)))  
df.loc[~df.type.str.contains("|".join(matches)), "match"] = np.nan

Upvotes: 0

Related Questions