Reputation: 664
I have a text where I would like to remove all uppercase consecutive characters up to a colon. I have only figured out how to remove all characters up to the colon itself; which results in the current output shown below.
Input Text
text = 'ABC: This is a text. CDEFG: This is a second text. HIJK: This is a third text'
Desired output:
'This is a text. This is a second text. This is a third text'
Current code & output:
re.sub(r'^.+[:]', '', text)
#current output
'This is a third text'
Can this be done with a one-liner regex or do I need to iterate through every character.isupper()
and then implement regex ?
Upvotes: 0
Views: 63
Reputation: 163457
You can use
\b[A-Z]+:\s*
\b
A word boundary to prevent a partial match[A-Z]+:
Match 1+ uppercase chars A-Z and a :
\s*
Match optional whitespace charsimport re
text = 'ABC: This is a text. CDEFG: This is a second text. HIJK: This is a third text'
print(re.sub(r'\b[A-Z]+:\s*', '', text))
Output
This is a text. This is a second text. This is a third text
Upvotes: 1