Reputation: 59
I have a column "Description" in my dataframe and I am searching this column for a list of keywords. I was able to return True or False values if the keyword is present in the particular row. I want to add one more column which shows which keyword from the list was matched with the data in that row.
for example:
content = ['paypal', 'silverline', 'bcg', 'onecap']
#dataframe df
Description Debit Keyword_present
onech xmx paypal 555 True
xxl 1ef yyy 141 False
bcg tte exact 411 True
And the new column should look like:
Keyword
paypal
NA
bcg
Till now, I have tried getting T/F values if the keywords are present.
#content is my list of keywords
present = new_df['Description'].str.contains('|'.join(content))
new_df['Keyword Present'] = present
Upvotes: 1
Views: 516
Reputation: 51
If your values in description
are always separated by space, you could use something like
content = ['paypal', 'silverline', 'bcg', 'onecap']
content = set(content)
df['keyword_matched'] = df['Description'].apply(lambda x: set(x:x.split(' ')).intersection(content)
It would return a set object, which you can modify as you like.
One advantage of this method could be that it can given multiple matching strings,
Upvotes: 0
Reputation: 150735
Instead of contains
, use extract
with somewhat different pattern:
pattern = '(' + '|'.join(content) + ')'
df['Keyword Present'] = df.Description.str.extract(pattern)
Output:
Description Debit Keyword_present Keyword Present
0 onech xmx paypal 555 True paypal
1 xxl 1ef yyy 141 False NaN
2 bcg tte exact 411 True bcg
Upvotes: 3