Reputation: 397
I have a pandas dataframe as below with text string in each tuple:
Text Col
-----------
I have an apple.
She eats orange.
Tom likes banana and orange
I would like to extract the key word from the "Text Col" and assign the keyword as value in another column "keyword col"
Text Col KeyWord
-----------------------------------------------------
I have an apple. apple
She eats orange. orange
Tom likes banana and orange banana, orange
I only know that I can check if the string contains specific keywords
df['Text Col'].str.contains('apple|orange|banana')
but I don't know how to assign the keyword contained in the column to another column.
Have googled but didnt have any similar question. Would someone please kindly help me with this?
Many Thanks!
Upvotes: 1
Views: 1003
Reputation: 3130
Use .str.extract
, e.g.
df['Text Col'].str.extract('(apple|orange|banana)', expand = False)
or .extractall
followed by .unstack
if you expect more than one match:
matches = df['Text Col'].str.extractall('(apple|orange|banana)').unstack()
You'll need to join them; if your dataset is small, you can do this in pure Python:
df['extracted'] = [','.join(filter(None,li)) for li in matches.values]
If you insist on doing this in pandas, you can use a loop over the columns, though it looks messy:
df['extracted'] = ''
for _,col in matches.fillna('').iteritems():
df['extracted'] += col + ','
df['extracted'] = df['extracted'].str.rstrip(',')
Upvotes: 2