Reputation: 351
I am looking for a Regex that would return from the following
The law of Huxley Something interesting. Some other interesting thing. The law of Dallas This thing is boring. The law of void Some stuff.
as a 2-line text where 2 groups have been identified :
I aim to rephrase it by separating the title from core text using capturing groups like this :
The law of Huxley
Something interesting. Some other interesting thing.
The law of Dallas
This thing is boring.
The law of void
Some stuff.
I have tried with
((The law [\w\s]+)([A-Z].+))+
to no avail
Upvotes: 2
Views: 63
Reputation: 626853
You can use
(The law\s+\w+\s\P{Lu}*)(\p{Lu}.*?)(?=The law|$)
See the regex demo.
Details:
(The law\s+\w+\s\P{Lu}*)
- Group 1: The law
text, then one or more whitespace, one or more word chars, a whitespace, and then any zero or more chars other than uppercase letters(\p{Lu}.*?)
- Group 2: an uppercase letter, and then any zero or more chars other than line break chars, as few as possible, up to the first occurrence of the subsequent subpatterns(?=The law|$)
- a positive lookahead that requires either The law
or end of string immediately to the right of the current location.Upvotes: 1