HellYeah
HellYeah

Reputation: 59

How to get the keyword that was matched from a list of keywords while searching in every row of a dataframe?

I have a column "Description" in my dataframe and I am searching this column for a list of keywords. I was able to return True or False values if the keyword is present in the particular row. I want to add one more column which shows which keyword from the list was matched with the data in that row.

for example:

content = ['paypal', 'silverline', 'bcg', 'onecap']

#dataframe df

Description        Debit  Keyword_present 

onech xmx paypal    555     True
xxl 1ef yyy         141     False
bcg tte exact       411     True

And the new column should look like:

 Keyword
 paypal
 NA
 bcg

Till now, I have tried getting T/F values if the keywords are present.

#content is my list of keywords

present = new_df['Description'].str.contains('|'.join(content)) 

new_df['Keyword Present'] = present

Upvotes: 1

Views: 516

Answers (2)

mukulgarg94
mukulgarg94

Reputation: 51

If your values in description are always separated by space, you could use something like

content = ['paypal', 'silverline', 'bcg', 'onecap']
content = set(content)

df['keyword_matched'] = df['Description'].apply(lambda x: set(x:x.split(' ')).intersection(content)

It would return a set object, which you can modify as you like.

One advantage of this method could be that it can given multiple matching strings,

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

Instead of contains, use extract with somewhat different pattern:

pattern = '(' + '|'.join(content) + ')'
df['Keyword Present'] = df.Description.str.extract(pattern)

Output:

        Description  Debit  Keyword_present Keyword Present
0  onech xmx paypal    555             True          paypal
1       xxl 1ef yyy    141            False             NaN
2     bcg tte exact    411             True             bcg

Upvotes: 3

Related Questions