Rdou
Rdou

Reputation: 219

About this regular expression (?<=\d)\d{4}

I use (?<=\d)\d{4} to match 1234567890, the result is 2345 6789. Why it's not 2345 7890?

In the second match, it starts from 6 and 6 is matched by (?<=\d), so I think the result is 7890 rather than 6789.

Besides, how about using ((?<=\d)\d{3})+ match 1234567890?

Upvotes: 1

Views: 116

Answers (4)

Bohemian
Bohemian

Reputation: 424993

Look behinds are non consuming, so the 5 is being "reused" in the second match (even though the first match consumed it).

If you want to start at 6, consume but don't capture:

\d(\d{4})

And use group 1, or if your regex engine supports it, use a negative look behind for \G, which is the end of the previous match:

(?!\G)(?<=\d)\d{4}

See a live demo.

Upvotes: 3

aelor
aelor

Reputation: 11116

(?<=\d)\d{4}

?<= Lookbehind. Makes sure a digit precedes the text to be matched.

What text are we matching ? d{4} So, Meaning is match those 4 digits which are preceded by one digit.

In 1234567890 such a match is 2345 as it is preceded by 1 Now we have got one match and the string to be matched still is 1234567890 Now checking the regex condition will again tell to find group of four digits which has a prefix as a digit. Since 2345 has already been matched, the next successful match is 6789 which is preceded by 5 satisfying the regex conditions.

Coming to (?<=\d)\d{3} it does the same thing as before only it makes a group of 3. Editing this regex to get the one mentioned by you, we add the whole thing in a capture group. ((?<=\d)\d{3}) and say one or more of this ((?<=\d)\d{3})+. A repeated capturing group will only capture the last iteration.

So 890 is returned as a match.

Upvotes: 0

Szymon
Szymon

Reputation: 43023

It matches this way as the first match finishes at 5 so the next group can be matched from 6. (?<=\d) matches 5 in this case and the match is on 6789, starting with 6.

(?<=\d) doesn't belong to the match, it doesn't consume a character, it's just asserting what is in front of the match.

Upvotes: 0

xdazz
xdazz

Reputation: 160833

(?<=\d) is Zero-Length Assertion, assertions do not consume characters in the string, but only assert whether a match is possible or not.

Upvotes: 0

Related Questions