Pylander
Pylander

Reputation: 1591

Pattern Match in List of Strings, Create New Column in pandas

I have a pandas dataframe with the following general format:

id,product_name_extract
1,00012CDN
2,14311121NDC
3,NDC37ba
4,47CD27

I also have a list of product codes I would like to match (unfortunately, I have to do NLP extraction, so it will not be a clean match) and then create a new column with the matching list value:

product_name = ['12CDN','21NDC','37ba','7CD2']

id,product_name_extract,product_name_mapped
1,00012CDN,12CDN
2,14311121NDC,21NDC
3,NDC37ba,37ba
4,47CD27,7CD2

I am not too worried about there being collisions.

This would be easy enough if I just needed a True/False indicator using contains and the list values concatenated together with "|" for alternation, but I am a bit stumped now on how I would create a column value of the exact match. Any tips or trick appreciated!

Upvotes: 3

Views: 1810

Answers (1)

sacuL
sacuL

Reputation: 51395

Since you're not worried about collisions, you can join your product_name list with the | operator, and use that as a regex:

df['product_name_mapped'] = (df.product_name_extract.str
                             .findall('|'.join(product_name))
                             .str[0])

Result:

>>> df
   id product_name_extract product_name_mapped
0   1             00012CDN               12CDN
1   2          14311121NDC               21NDC
2   3              NDC37ba                37ba
3   4               47CD27                7CD2

Upvotes: 5

Related Questions