Winds
Winds

Reputation: 397

Python pandas | how to assign keywords extracted from a column to another column?

I have a pandas dataframe as below with text string in each tuple:

  Text Col
-----------
I have an apple.
She eats orange.
Tom likes banana and orange

I would like to extract the key word from the "Text Col" and assign the keyword as value in another column "keyword col"

  Text Col                              KeyWord
-----------------------------------------------------
I have an apple.                        apple
She eats orange.                        orange
Tom likes banana and orange             banana, orange

I only know that I can check if the string contains specific keywords df['Text Col'].str.contains('apple|orange|banana') but I don't know how to assign the keyword contained in the column to another column.

Have googled but didnt have any similar question. Would someone please kindly help me with this?

Many Thanks!

Upvotes: 1

Views: 1003

Answers (1)

Ken Wei
Ken Wei

Reputation: 3130

Use .str.extract, e.g.

df['Text Col'].str.extract('(apple|orange|banana)', expand = False)

or .extractall followed by .unstack if you expect more than one match:

matches = df['Text Col'].str.extractall('(apple|orange|banana)').unstack()

You'll need to join them; if your dataset is small, you can do this in pure Python:

df['extracted'] = [','.join(filter(None,li)) for li in matches.values]

If you insist on doing this in pandas, you can use a loop over the columns, though it looks messy:

df['extracted'] = ''
for _,col in matches.fillna('').iteritems():
    df['extracted'] += col + ','
df['extracted'] = df['extracted'].str.rstrip(',')

Upvotes: 2

Related Questions