Reputation: 61

How to make capture group "absorb" whitespace before/after it without capturing it?

I have a regex expression found here. Try out the strings below, the problem I'm facing is that there's an extra whitespace located at the beginning of each captured group after the 1st one. I need the whitespace to be matched but I don't need them to be captured.

Regex expression:

^(\/[a-zA-Z0-9]+)?(\s~[a-zA-Z]+)?([\w\s'()-]+)?((?:\s~[a-zA-Z]+){0,2})?$

Viewing it at the link above makes it much simpler to comprehend.

These are some strings you can paste into the test string area one by one:

/test ~example matches ~extra ~space
this too has an extra ~space ~matched
/like wise for this
/and ~this

Take a look at the match groups area and notice that after the 1st group, the 1 preceding whitespace between groups are captured.

What I want to do is this:

For the 1st and 2nd capture group, I want them to detect a succeeding space and absorb it but not capture it, so that the 3rd capture group won't detect and capture the extra space. For the 4th capture group, I want it to detect a preceding space and absorb it but not capture it.

What I mean by absorb is that the space gets "removed" in a sense that the 3rd capture group won't realize it's there.

How can I do this?

Thanks.

Upvotes: 2

Answers (3)

Kamehameha

Reputation: 5473

This is the regex that I came up with-

^(\/[a-zA-Z0-9]+)?(?:\s)?(~[a-zA-Z]+)?(?:\s)?([\w\'()\-\s]+)?(?:\s(~[a-zA-Z]+))?(?:\s(~[a-zA-Z]+))?$

ELaborating the regex in 2 parts as per the requirement-

For the 1st and 2nd capture group, I want them to detect a succeeding space and absorb it but not capture it, so that the 3rd capture group won't detect and capture the extra space.

Your regex for the 1st and 2nd groups -

(\/[a-zA-Z0-9]+)?(\s~[a-zA-Z]+)?

So, after each first and second capturing group, I've added a non-capturing (?:\s)? .This allows the 3rd capturing group to not absorb the preceding space. This is my regex -

(\/[a-zA-Z0-9]+)?(?:\s)?(~[a-zA-Z]+)?(?:\s)?

For the 4th capture group, I want it to detect a preceding space and absorb it but not capture it.

Your regex

((?:\s~[a-zA-Z]+){0,2})?

Here, an obvious solution would be to capture only the text part([a-zA-Z]) and non-capture the \s part. Something like this,

(?:(?:\s(~[a-zA-Z]+)){0,2})?
         ^^^^^^^^^^ Capturing only this.

But this is a repeated capturing group, where effectively you are capturing a new element on top of the old element. Basically, A repeated capturing group will only capture the last iteration. So if you wanted to match-

" ~space ~matched", it will only capture the last "~matched".

So one solution would be that since you are checking it for {0,2}, you can explicitly check for it 2 times, like so -

(?:\s(~[a-zA-Z]+))?(?:\s(~[a-zA-Z]+))?

But if the requirement for {0,2} later changes then, the best solution would be to capture the preceding spaces and split the captured group by spaces separately.

->  OUTPUT - when I run this regex for the given strings in JavaScript-
["/test ~example matches ~extra ~space", "/test", "~example", "matches", "~extra", "~space", index: 0, input: "/test ~example matches ~extra ~space"] (index):18
["this too has an extra ~space ~matched", undefined, undefined, "this too has an extra", "~space", "~matched", index: 0, input: "this too has an extra ~space ~matched"] (index):18
["/like wise for this", "/like", undefined, "wise for this", undefined, undefined, index: 0, input: "/like wise for this"] (index):18
["/and ~this", "/and", "~this", undefined, undefined, undefined, index: 0, input: "/and ~this"]

Hope this helped.

Upvotes: 1

anubhava

Reputation: 785256

Try this regex:

^(\/[a-zA-Z0-9]+)?\s?(~[a-zA-Z]+)?\s*([\w\s'()-]+)?\s?((?:~[a-zA-Z]+\s?){0,2})?$

Online Demo: http://regex101.com/r/rA5tR0

Upvotes: 0

Thayne

Reputation: 6992

I think this does what you want:

^(\/[a-zA-Z0-9]+)?(?:(\s~[a-zA-Z]+)\s)?([\w\s'()-]+)?(?:\s((?:~[a-zA-Z]+\s?){0,2}))?$

Upvotes: 0

How to make capture group &quot;absorb&quot; whitespace before/after it without capturing it?

Answers (3)

Online Demo: http://regex101.com/r/rA5tR0

Related Questions

How to make capture group "absorb" whitespace before/after it without capturing it?