Frederi ROSE
Frederi ROSE

Reputation: 351

Gather a repeating 2-group pattern

I am looking for a Regex that would return from the following

The law of Huxley Something interesting. Some other interesting thing. The law of Dallas This thing is boring. The law of void Some stuff.

as a 2-line text where 2 groups have been identified :

  1. first group that starts with "The law" and finishes at the first capital letter ;
  2. second group that starts afterwards and ends when the next fisrt group "The law" pattern is encountered.

I aim to rephrase it by separating the title from core text using capturing groups like this :

The law of Huxley 
Something interesting. Some other interesting thing. 

The law of Dallas 
This thing is boring.

The law of void
Some stuff.

I have tried with

((The law [\w\s]+)([A-Z].+))+

to no avail

Upvotes: 2

Views: 63

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626853

You can use

(The law\s+\w+\s\P{Lu}*)(\p{Lu}.*?)(?=The law|$)

See the regex demo.

Details:

  • (The law\s+\w+\s\P{Lu}*) - Group 1: The law text, then one or more whitespace, one or more word chars, a whitespace, and then any zero or more chars other than uppercase letters
  • (\p{Lu}.*?) - Group 2: an uppercase letter, and then any zero or more chars other than line break chars, as few as possible, up to the first occurrence of the subsequent subpatterns
  • (?=The law|$) - a positive lookahead that requires either The law or end of string immediately to the right of the current location.

Upvotes: 1

Related Questions