Reputation: 1561
I am trying to extract all the content from a Dataframe till a specific word appears. I am trying to extract the entire content till the following words appear:
high, medium, low
Sample view of the text in the Dataframe:
text
Ticket creation dropped in last 24 hours medium range for cust_a
Calls dropped in last 3 months high range for cust_x
Expected output:
text, new_text
Ticket creation dropped in last 24 hours medium range for cust_a, Ticket creation dropped in last 24 hours
Calls dropped in last 3 months high range for cust_x, Calls dropped in last 3 months
Upvotes: 1
Views: 44
Reputation: 23099
You need replace
and regex
.
The idea will be to match any words from your list and then replace it and anything after it.
We use .*
to match anything until the end of a string:
words = 'high, medium, low'
match_words = '|'.join(words.split(', '))
#'high|medium|low'
df['new_text'] = df['text'].str.replace(f"({match_words}).*",'',regex=True)
print(df['text_new'])
0 Ticket creation dropped in last 24 hours
1 Calls dropped in last 3 months
Name: text, dtype: object
Upvotes: 2