Reputation: 146
For this sample text:
The quick brown fox jumps over the lazy dog" is an 1*** English-language 2*** pangram—a phrase that contains all of the letters of the alphabet. It is commonly used for touch-typing practice. It is also used to test typewriters and computer keyboards, show fonts, and other applications involving all of the letters in the 3*** English alphabet 4***.
I need one regex expression to match only between the many x*** tokens while stripping leading and trailing white space. If my limited knowledge of regex is correct, then the result should match into two separate lookaround groups.
English-language
English alphabet
I have two expressions that work in isolation but not in tandem:
(?<=1\*\*\*\s).*(?=\s2\*\*\*)
....
(?<=3\*\*\*\s).*(?=\s4\*\*\*)
I have tried various ways to combine them in one expression but only got incorrect results. e.g.
(?<=1\*\*\*\s).*(?=\s2\*\*\*)\w+(?<=3\*\*\*\s).*(?=\s4\*\*\*)
NO MATCHES
I should point out that I have control over the token format, so feel free to recommend one based on ease of use in regex. It just needs to comprise a sequence of mostly non-alphanumeric characters so it's not found natively in the data. My guess is I likely need at least two tokens; one start and one end.
EDIT: I have made progress but my regex engine behaves differently to that at regex101:
(?<=1\*\*\*\s)(.*)(?=\s2\*\*\*).*?(?<=3\*\*\*\s)(.*)(?=\s4\*\*\*)
Results in:
English-language 2*** 3*** English-language
Why? How can this be corrected?
Upvotes: 4
Views: 207
Reputation: 36100
If you want a regex that will match one or the other, you can just use alternation (|
):
(?<=1\*\*\*\s).*(?=\s2\*\*\*)|(?<=3\*\*\*\s).*(?=\s4\*\*\*)
If you want a regex that will match both in one go in separate groups, you can use .*?
in between and put them in matching groups (()
):
(?<=1\*\*\*\s)(.*)(?=\s2\*\*\*).*?(?<=3\*\*\*\s)(.*)(?=\s4\*\*\*)
Upvotes: 1
Reputation: 67968
(?<=[1-9]\*\*\*)\s*(.*?)(?=\s*[1-9]\*\*\*)
You can use this and grab the group 1
.See demo.
https://regex101.com/r/cZ0sD2/9
if u only want 2
matches use
(?<=[13]\*\*\*)\s*(.*?)(?=\s*[24]\*\*\*)
Upvotes: 1