Matan Retzer
Matan Retzer

Reputation: 65

Python - Search and present strings within a column in dataframe from a list

How I present in Python the words that found from a given list of words, in a text column for each row.

And if there are several words that found in the text column, I want to present them separated by ",".

Example:

I have the following list:

color_list = ['White','Yellow','Blue','Red']

which I need to search within a dataframe (df):

      doc    text             
0    3000 'colors White Yellow'
1    3001 'Green Black'
2    3002 'I want the color Red'

and insert the matching rows into a new column with the matching words from the list:

 doc      text                      words
0    3000 'colors White Yellow'    'White, Yellow'
1    3001 'Green Black'             
2    3002 'I want the color Red'   'Red'

I used the code to extract the matching word, but I manage to present only one word for each row:

df['words'] = df.text.str.extract('(?i)({0})'.format('|'.join(color_list )))

And I can't figure out how to do this in Python (in R I did this)

This specific issue is new because the challenge is to present more than one string from a list and not just one value.

Thanks in advance for your help.

Upvotes: 0

Views: 235

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

You need to extract the words with str.findall and then join the results with ", ".join:

import pandas as pd

color_list = ['White','Yellow','Blue','Red']
df = pd.DataFrame({"doc": [3000, 3001, 3002], "text": ["colors White Yellow", "Green Black", "I want the color Red"]})
df['words'] = df['text'].str.findall(r'(?i)\b(?:{})\b'.format('|'.join(color_list))).apply(', '.join)

Output:

    doc                  text          words
0  3000   colors White Yellow  White, Yellow
1  3001           Green Black               
2  3002  I want the color Red            Red

This assumes all the terms in color_list consist of only word characters.

Upvotes: 1

Related Questions