Reputation: 303
My data frame looks like -
id text
1 good,i am interested..please mail me.
2 call me...good to go with you
3 not interested...bye
4 i am not interested don't call me
5 price is too high so not interested
6 i have some requirement..please mail me
I want the data frame looks like -
id text is_relevant
1 good,i am interested..please mail me. yes
2 call me...good to go with you yes
3 not interested...bye no
4 i am nt interested don't call me no
5 price is too high so not interested no
6 i have some requirement..please mail me yes
I have done the following code -
d1 = {'no': ['Not interested','nt interested']}
d = {k: oldk for oldk, oldv in d1.items() for k in oldv}
df["is_relevant"] = df['new_text'].map(d).fillna('yes')
Upvotes: 0
Views: 154
Reputation: 26676
If all you want is what is in the list ['not interested', 'nt interested']
.
If the values are in ad dict, send them to a list as follows lst=list(dict.values())
and still np.where
Then just np.where
lst=['not interested', 'nt interested']
df['is_relevant']=np.where(df.text.str.contains("|".join(lst)),'no','yes')
text is_relevant
1 good,i am interested..please mail me. yes
2 call me...good to go with you yes
3 not interested...bye no
4 i am not interested don't call me no
5 price is too high so not interested no
6 i have some requirement..please mail me yes
Upvotes: 0
Reputation: 326
This is similar to YOLO's answer above but allows for multiple text classes.
df = pd.DataFrame(
data = ["good,i am interested..please mail me.",
"call me...good to go with you",
"not interested...bye",
"i am not interested don't call me",
"price is too high so not interested",
"i have some requirement..please mail me"],
columns=['text'], index=[1,2,3,4,5,6])
d1 = {'no': ['Not interested','nt interested','not interested'],
'maybe': ['requirement']}
df['is_relevant'] = 'yes'
for k in d1:
match_inds = reduce(lambda x,y: x | y,
[df['text'].str.contains(pat) for pat in d1[k]])
df.loc[match_inds, 'is_relevant'] = k
print(df)
Output
text is_relevant
1 good,i am interested..please mail me. yes
2 call me...good to go with you yes
3 not interested...bye no
4 i am not interested don't call me no
5 price is too high so not interested no
6 i have some requirement..please mail me maybe
Upvotes: 0
Reputation: 21709
You can do:
d1 = {'no': ['not interested','nt interested']}
# create regex
reg = '|'.join([f'\\b{x}\\b' for x in list(d1.values())[0]])
# apply function
df['is_relevant'] = df['text'].str.lower().str.contains(reg).map({True: 'no', False: 'yes'})
id text is_relevant
0 1 good,i am interested..please mail me. yes
1 2 call me...good to go with you yes
2 3 not interested...bye no
3 4 i am not interested don't call me no
4 5 price is too high so not interested no
5 6 i have some requirement..please mail me yes
print(df)
Upvotes: 1
Reputation: 17368
In [20]: df = pd.read_csv("a.csv")
In [21]: a
Out[21]: ['not interested', 'nt interested']
In [22]: df
Out[22]:
id text
0 1 good i am interested..please mail me.
1 2 call me...good to go with you
2 3 not interested...bye
3 4 i am not interested don't call me
4 5 price is too high so not interested
5 6 i have some requirement..please mail me
In [23]: df["is_relevant"] = df["text"].apply(lambda x: "no" if (a[0] in x.lower() or a[1] in x.lower()) else "yes")
In [24]: df
Out[24]:
id text is_relevant
0 1 good i am interested..please mail me. yes
1 2 call me...good to go with you yes
2 3 not interested...bye no
3 4 i am not interested don't call me no
4 5 price is too high so not interested no
5 6 i have some requirement..please mail me yes
Upvotes: 1