Reputation: 65
How I present in Python the words that found from a given list of words, in a text column for each row.
And if there are several words that found in the text column, I want to present them separated by ",".
Example:
I have the following list:
color_list = ['White','Yellow','Blue','Red']
which I need to search within a dataframe (df):
doc text
0 3000 'colors White Yellow'
1 3001 'Green Black'
2 3002 'I want the color Red'
and insert the matching rows into a new column with the matching words from the list:
doc text words
0 3000 'colors White Yellow' 'White, Yellow'
1 3001 'Green Black'
2 3002 'I want the color Red' 'Red'
I used the code to extract the matching word, but I manage to present only one word for each row:
df['words'] = df.text.str.extract('(?i)({0})'.format('|'.join(color_list )))
And I can't figure out how to do this in Python (in R I did this)
This specific issue is new because the challenge is to present more than one string from a list and not just one value.
Thanks in advance for your help.
Upvotes: 0
Views: 235
Reputation: 627536
You need to extract the words with str.findall
and then join the results with ", ".join
:
import pandas as pd
color_list = ['White','Yellow','Blue','Red']
df = pd.DataFrame({"doc": [3000, 3001, 3002], "text": ["colors White Yellow", "Green Black", "I want the color Red"]})
df['words'] = df['text'].str.findall(r'(?i)\b(?:{})\b'.format('|'.join(color_list))).apply(', '.join)
Output:
doc text words
0 3000 colors White Yellow White, Yellow
1 3001 Green Black
2 3002 I want the color Red Red
This assumes all the terms in color_list
consist of only word characters.
Upvotes: 1