Kevin Nash
Kevin Nash

Reputation: 1561

Pandas - Extract all content based on certain keywords

I am trying to extract all the content from a Dataframe till a specific word appears. I am trying to extract the entire content till the following words appear:

high, medium, low

Sample view of the text in the Dataframe:

text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x

Expected output:

text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months

Upvotes: 1

Views: 44

Answers (1)

Umar.H
Umar.H

Reputation: 23099

You need replace and regex.

The idea will be to match any words from your list and then replace it and anything after it.

We use .* to match anything until the end of a string:

words = 'high, medium, low'
match_words = '|'.join(words.split(', '))
#'high|medium|low'

df['new_text'] = df['text'].str.replace(f"({match_words}).*",'',regex=True)


print(df['text_new'])

0    Ticket creation dropped in last 24 hours 
1              Calls dropped in last 3 months 
Name: text, dtype: object

Upvotes: 2

Related Questions