John Davis
John Davis

Reputation: 303

how to match partial string from a text in pandas dataframe

My data frame looks like -

id                               text
1         good,i am interested..please mail me.
2         call me...good to go with you
3         not interested...bye
4         i am not interested don't call me
5         price is too high so not interested
6         i have some requirement..please mail me

I want the data frame looks like -

id                               text                          is_relevant
1         good,i am interested..please mail me.                    yes
2         call me...good to go with you                            yes
3         not interested...bye                                      no
4         i am nt interested don't call me                          no
5         price is too high so not interested                       no
6         i have some requirement..please mail me                   yes

I have done the following code -

d1 = {'no': ['Not interested','nt interested']}
d = {k: oldk for oldk, oldv in d1.items() for k in oldv}
df["is_relevant"] = df['new_text'].map(d).fillna('yes')

Upvotes: 0

Views: 154

Answers (4)

wwnde
wwnde

Reputation: 26676

If all you want is what is in the list ['not interested', 'nt interested'].

If the values are in ad dict, send them to a list as follows lst=list(dict.values()) and still np.where

Then just np.where

lst=['not interested', 'nt interested']
df['is_relevant']=np.where(df.text.str.contains("|".join(lst)),'no','yes')

                                     text    is_relevant
1    good,i am interested..please mail me.         yes
2            call me...good to go with you         yes
3                     not interested...bye          no
4        i am not interested don't call me          no
5      price is too high so not interested          no
6  i have some requirement..please mail me         yes

Upvotes: 0

proteome
proteome

Reputation: 326

This is similar to YOLO's answer above but allows for multiple text classes.

df = pd.DataFrame(
    data = ["good,i am interested..please mail me.",
            "call me...good to go with you",
            "not interested...bye",
            "i am not interested don't call me",
            "price is too high so not interested",
            "i have some requirement..please mail me"],
    columns=['text'], index=[1,2,3,4,5,6])

d1 = {'no': ['Not interested','nt interested','not interested'],
      'maybe': ['requirement']}
df['is_relevant'] = 'yes'

for k in d1:
    match_inds = reduce(lambda x,y: x | y,
                        [df['text'].str.contains(pat) for pat in d1[k]])
    df.loc[match_inds, 'is_relevant'] = k

print(df)

Output

   text                                    is_relevant
1    good,i am interested..please mail me.         yes
2            call me...good to go with you         yes
3                     not interested...bye          no
4        i am not interested don't call me          no
5      price is too high so not interested          no
6  i have some requirement..please mail me       maybe

Upvotes: 0

YOLO
YOLO

Reputation: 21709

You can do:

d1 = {'no': ['not interested','nt interested']}

# create regex 
reg = '|'.join([f'\\b{x}\\b' for x in list(d1.values())[0]])

# apply function
df['is_relevant'] = df['text'].str.lower().str.contains(reg).map({True: 'no', False: 'yes'})

   id                                     text is_relevant
0   1    good,i am interested..please mail me.         yes
1   2            call me...good to go with you         yes
2   3                     not interested...bye          no
3   4        i am not interested don't call me          no
4   5      price is too high so not interested          no
5   6  i have some requirement..please mail me         yes
print(df)

Upvotes: 1

bigbounty
bigbounty

Reputation: 17368

In [20]: df = pd.read_csv("a.csv")

In [21]: a
Out[21]: ['not interested', 'nt interested']

In [22]: df
Out[22]:
   id                                     text
0   1    good i am interested..please mail me.
1   2            call me...good to go with you
2   3                     not interested...bye
3   4        i am not interested don't call me
4   5      price is too high so not interested
5   6  i have some requirement..please mail me

In [23]: df["is_relevant"] = df["text"].apply(lambda x: "no" if (a[0] in x.lower() or a[1] in x.lower()) else "yes")

In [24]: df
Out[24]:
   id                                     text is_relevant
0   1    good i am interested..please mail me.         yes
1   2            call me...good to go with you         yes
2   3                     not interested...bye          no
3   4        i am not interested don't call me          no
4   5      price is too high so not interested          no
5   6  i have some requirement..please mail me         yes

Upvotes: 1

Related Questions