Reputation: 182
I have a dict of emotions (anger, fear, anticipation, trust, etc...) with words associated to the emotions
anticipationlist:
{'anticipation': ['abundance',
'opera',
'star',
'start',
'achievement',
'acquiring',...]
And, I have a dataframe with rows of sentences.I want to find the words that associated to the emotion
| text |
|--------------------------- |
| operation start yesterday |
| operation start now |
| operation halt |
Expected output
| text | result |
|--------------------------- |------------- |
| operation start yesterday | start |
| operation start now | start |
| operation achievement | achievement |
I tried
df['result']=df["text"].str.findall(r"\b"+"|".join(anticipationlist) +r"\b").apply(", ".join)
my result is
| text | result |
|--------------------------- |-------------------- |
| operation start yesterday | opera, star |
| operation start now | opera, star |
| operation achievement | opera, achievement |
How to improve my code to get my desired outcome?
Upvotes: 1
Views: 875
Reputation: 893
Here's an approach that doesn't use regex. Also, I changed your anticipationlist
from a dict
to a list
.
import pandas as pd
anticipationlist= ['abundance',
'opera',
'star',
'start',
'achievement',
'acquiring',
]
values = [
'operation start yesterday',
'operation start now',
'operation achievement',
]
df = pd.DataFrame(data=values, columns=['text'])
def find_values(x):
results = []
for value in anticipationlist:
for word in x.split():
if word == value:
results.append(word)
return ' '.join(results)
df['result'] = df['text'].apply(lambda x: find_values(x))
print(df.head())
Upvotes: 0
Reputation: 862406
You can add words boundaries for each value separately:
pat = '|'.join(r"\b{}\b".format(x) for x in anticipationlist)
df['result']=df["text"].str.findall(pat).apply(", ".join)
print (df)
text result
0 operation start yesterday start
1 operation start now start
2 operation achievement achievement
Upvotes: 1