Reputation:
I have a response:
MS1:111980613994 124 MS2:222980613994124
I have the following regex:
MS\d:(\d(?:\r?\n?)){15}
According to Regex, the "(?:\r?\n?)
" part should let it match for the group but exclude it from the capture (so I get a contiguous value from the group).
Problem is that for "MS1:xxx
" it matches the [CR][LF]
and includes it in the group. It should be excluded from the capture ...
Help please.
Upvotes: 3
Views: 8595
Reputation: 143344
The (?:...)
syntax does not mean that the enclosed pattern will be excluded from any capture groups that enclose the (?:...)
.
It only means that that the group formed by (?:...)
will be a non-capturing group, as opposed to a new capture group.
Put another way:
(?:...)
only groups(...)
has two functions: it both groups and captures.Capture groups capture all of the text matched by the pattern they enclose, even the parts that are matched by nested groups (whether they are capturing or not).
With the regex...
.*(l.*(o.*o).*l).*
...there are two capture groups. If we match this against hello world
we get the following captures:
lo worl
o wo
Note that the text captured by group 2 is also captured by group 1.
If we change the inner group to be non-capturing...
.*(l.*(?:o.*o).*l).*
...group 1's capture will not be changed (when matched against the same string), but there is no longer a group 2:
lo worl
As you can see, if a non-capturing group is enclosed by a capture group, that enclosing capture group will capture the characters matched by the non-capturing group.
The purpose of non-capturing groups is not to exclude content from other capturing groups, but rather to act as a way to group operations without also capturing.
For example, if you want to repeat a substring, you might write (?:substring)*
.
If you really want to ignore embedded \r
s and \n
s your best bet is to strip them out in a second step. You don't say what language you're using, but something equivalent to this (Python) should work:
s = re.sub(r'[\r\n]', '', s)
Upvotes: 4
Reputation: 189
So far as I know, you'll have to use 2 regexes. One is "MS\d:(\d(?:\r?\n?)){15}", the other is used to remove the line breaks from the matches.
Please refer to "Regular expression to skip character in capture group".
Upvotes: 0
Reputation: 497712
Perhaps what you mean to do here is place the [CR][LF] matching part outside of the captured group, something like: MS\d:(\d){15}(?:\r?\n?)
Upvotes: 0