Whitey
Whitey

Reputation:

Regex Exclude Character From Group

I have a response:

MS1:111980613994
124 MS2:222980613994124

I have the following regex:

MS\d:(\d(?:\r?\n?)){15}

According to Regex, the "(?:\r?\n?)" part should let it match for the group but exclude it from the capture (so I get a contiguous value from the group).

Problem is that for "MS1:xxx" it matches the [CR][LF] and includes it in the group. It should be excluded from the capture ...

Help please.

Upvotes: 3

Views: 8595

Answers (4)

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143344

The (?:...) syntax does not mean that the enclosed pattern will be excluded from any capture groups that enclose the (?:...).

It only means that that the group formed by (?:...) will be a non-capturing group, as opposed to a new capture group.

Put another way:

  • (?:...) only groups
  • (...) has two functions: it both groups and captures.

Capture groups capture all of the text matched by the pattern they enclose, even the parts that are matched by nested groups (whether they are capturing or not).

An example

With the regex...

.*(l.*(o.*o).*l).*

...there are two capture groups. If we match this against hello world we get the following captures:

  • 1: lo worl
  • 2: o wo

Note that the text captured by group 2 is also captured by group 1.

If we change the inner group to be non-capturing...

.*(l.*(?:o.*o).*l).*

...group 1's capture will not be changed (when matched against the same string), but there is no longer a group 2:

  • 1: lo worl

As you can see, if a non-capturing group is enclosed by a capture group, that enclosing capture group will capture the characters matched by the non-capturing group.

What are they For?

The purpose of non-capturing groups is not to exclude content from other capturing groups, but rather to act as a way to group operations without also capturing.

For example, if you want to repeat a substring, you might write (?:substring)*.

How do I solve my real problem?

If you really want to ignore embedded \rs and \ns your best bet is to strip them out in a second step. You don't say what language you're using, but something equivalent to this (Python) should work:

s = re.sub(r'[\r\n]', '', s)

Upvotes: 4

raccoon
raccoon

Reputation: 1

How about MS\d:(?:(\d)\r?\n?){15}

Upvotes: -2

boxoft
boxoft

Reputation: 189

So far as I know, you'll have to use 2 regexes. One is "MS\d:(\d(?:\r?\n?)){15}", the other is used to remove the line breaks from the matches.

Please refer to "Regular expression to skip character in capture group".

Upvotes: 0

Cascabel
Cascabel

Reputation: 497712

Perhaps what you mean to do here is place the [CR][LF] matching part outside of the captured group, something like: MS\d:(\d){15}(?:\r?\n?)

Upvotes: 0

Related Questions