Teuku Ario Maulana
Teuku Ario Maulana

Reputation: 1

How to find substring in list and return to substring in list instead of true or false only

Hi i have dataset something like this

dx = pd.DataFrame({'IDs':[1234,5346,1234,8793,8793],
                    'Names':['APPLE ABCD ONE','APPLE ABCD','NO STRAWBERRY YES','ORANGE AVAILABLE','TEA AVAILABLE']})

kw = ['APPLE', 'ORANGE', 'LEMONS', 'STRAWBERRY', 'BLUEBERRY', 'TEA COFFEE']
dx['Check']=dx['Names'].apply(lambda x: 1 if any(k in x for k in kw) else 0)

instead of returning to 1 or 0 i want it to return to kw like 'APPLE', 'ORANGE' or 'TEA COFFE' in new column

hope anyone can help me

Thank you

Upvotes: 0

Views: 37

Answers (2)

kale
kale

Reputation: 36

would this work?

dx['Check']=dx['Names'].apply(lambda x: [k for k in kw if k in x ])

Upvotes: 0

mozway
mozway

Reputation: 260300

Use a regex with str.extract to benefit from vectorial speed:

import re

regex = '|'.join(map(re.escape, kw))
dx['Check'] = dx['Names'].str.extract(f'({regex})')

NB. this only returns the first match, if you want all use extractall and perform an aggregation step.

output:

    IDs              Names       Check
0  1234     APPLE ABCD ONE       APPLE
1  5346         APPLE ABCD       APPLE
2  1234  NO STRAWBERRY YES  STRAWBERRY
3  8793   ORANGE AVAILABLE      ORANGE
4  8793      TEA AVAILABLE         NaN

Upvotes: 1

Related Questions