How to get the specific word fron str.contains

Question

I have a pandas data frame with ID and text string. I am trying categorize the record with str.contains I need the word from the text string the str.contains code has identified in different columns.I am using python 3 and pandas My df is as follows:

ID  Text
1   The cricket world cup 2019 has begun
2   I am eagrly waiting for the cricket worldcup 2019 
3   I will try to watch all the mathes my favourite teams playing in the cricketworldcup 2019
4   I love cricket to watch and badminton to play


searchfor = ['cricket','world cup','2019']
 df['text'].str.contains('|'.join(searchfor))

ID  Text                                    phrase1 phrase2    phrase3
1   The cricket world cup 2019 has begun    cricket  world cup 2019
2   I am eagrly waiting for the 
cricket worldcup 2019                           cricket world cup   2019
3   I will try to watch all the mathes my 
favourite teams playing in the 
cricketworldcup 2019                           cricket  world cup   2019
4   I love cricket to watch and badminton 
to play                                        cricket

Mohit Motwani · Accepted Answer

You can use np.where:

import numpy as np
search_for = ['cricket', 'world cup', '2019']

for word in search_for:
    df[word] = np.where(df.text.str.contains(word), word, np.nan)

df


     text                                               cricket    world cup    2019
1   The cricket world cup 2019 has begun                cricket    world cup    2019
2   I am eagrly waiting for the cricket worldcup 2019   cricket     nan         2019
3   I will try to watch all the mathes my favourit...   cricket     nan         2019
4   I love cricket to watch and badminton to play       cricket     nan         nan

Syntax of np.where: np.where(condition[, x, y]). If the condition is True, it returns x otherwise y

How to get the specific word fron str.contains

Answers (2)

Related Questions