SPB
SPB

Reputation: 146

How to combine regex lookaround expressions

For this sample text:

The quick brown fox jumps over the lazy dog" is an 1*** English-language 2*** pangram—a phrase that contains all of the letters of the alphabet. It is commonly used for touch-typing practice. It is also used to test typewriters and computer keyboards, show fonts, and other applications involving all of the letters in the 3*** English alphabet 4***.

I need one regex expression to match only between the many x*** tokens while stripping leading and trailing white space. If my limited knowledge of regex is correct, then the result should match into two separate lookaround groups.

English-language

English alphabet

I have two expressions that work in isolation but not in tandem:

(?<=1\*\*\*\s).*(?=\s2\*\*\*)
....
(?<=3\*\*\*\s).*(?=\s4\*\*\*)

I have tried various ways to combine them in one expression but only got incorrect results. e.g.

(?<=1\*\*\*\s).*(?=\s2\*\*\*)\w+(?<=3\*\*\*\s).*(?=\s4\*\*\*)

NO MATCHES

I should point out that I have control over the token format, so feel free to recommend one based on ease of use in regex. It just needs to comprise a sequence of mostly non-alphanumeric characters so it's not found natively in the data. My guess is I likely need at least two tokens; one start and one end.

EDIT: I have made progress but my regex engine behaves differently to that at regex101:

(?<=1\*\*\*\s)(.*)(?=\s2\*\*\*).*?(?<=3\*\*\*\s)(.*)(?=\s4\*\*\*)

Results in:

English-language 2*** 3*** English-language

Why? How can this be corrected?

Upvotes: 4

Views: 207

Answers (2)

ndnenkov
ndnenkov

Reputation: 36100

If you want a regex that will match one or the other, you can just use alternation (|):

(?<=1\*\*\*\s).*(?=\s2\*\*\*)|(?<=3\*\*\*\s).*(?=\s4\*\*\*)

See it in action


If you want a regex that will match both in one go in separate groups, you can use .*? in between and put them in matching groups (()):

(?<=1\*\*\*\s)(.*)(?=\s2\*\*\*).*?(?<=3\*\*\*\s)(.*)(?=\s4\*\*\*)

See it in action

Upvotes: 1

vks
vks

Reputation: 67968

(?<=[1-9]\*\*\*)\s*(.*?)(?=\s*[1-9]\*\*\*)

You can use this and grab the group 1.See demo.

https://regex101.com/r/cZ0sD2/9

if u only want 2 matches use

(?<=[13]\*\*\*)\s*(.*?)(?=\s*[24]\*\*\*)

Upvotes: 1

Related Questions