Reputation: 11
Giving a text like this :
text= "THE TEXT contains uppercase letter, but ALSO LOWER case ones. This is another sentence."
I want an output something like this -->
['THE TEXT contains uppercase letter, but', 'ALSO LOWER case ones. This is another sentence.']
How can i write a regex to obtain that output?
I tried with this regex "(\b[A-Z][A-Z]+(?:\s+[A-Z][A-Z]+)*\b)"
but the output was differnt:
[ '',
'THE TEXT',
'contains uppercase letter, but',
'ALSO LOWER',
'case ones. This is another sentence.']
Upvotes: 1
Views: 480
Reputation: 626893
You can match and extract them with
re.findall(r'\b[A-Z]{2,}(?:\s+[A-Z]{2,})*\b.*?(?=\s*\b[A-Z]{2}|$)', text, re.DOTALL)
See the regex demo.
Details:
\b[A-Z]{2,}(?:\s+[A-Z]{2,})*\b
- word boundary, two or more uppercase letters, zero or more repetitions of one or more whitespaces, two or more ASCII uppercase letters and a word boundary.*?
- any zero or more chars as few as possible(?=\s*\b[A-Z]{2}|$)
- a positive lookahead that matches a location that is immediately followed with zero or more whitespaces, word boundary and two uppercase letters, or end of string.Upvotes: 1