Reputation: 785
Here is the text:
1234567890
The regular expression:
s/(\d)((\d\d\d)+\b)/\1,\2/g
The expected result:
1,234,567,890
The actual result:
1,234567890
This is an example used to add a comma per 3 digits from right to left from mastering regular expression
. Here is the explaination:
This is because the digits matched by (\d\d\d)+ are now actually part of the final match, and so are not left "unmatched" and available to the next iteration of the regex via the /g.
But I still don't understand it and I hope anybody could help me to figure it out detailly. Thanks in advance.
Upvotes: 0
Views: 41
Reputation: 26667
Prerequisite
The regex engine will match each character from left to right. And the matched characters are consumed by the engine. That is once consumed you cannot go back reconsume those characters again.
How does the match occure for (\d)((\d\d\d)+\b)
1234567890
|
(\d)
1234567890
|||
(\d\d\d)+
1234567890
|
\b #cannot be matched, hence it goes for another `(\d\d\d)+`
1234567890
|||
(\d\d\d)+
1234567890
|
\b #cannot be matched, hence it goes for another `(\d\d\d)+`
1234567890
|||
(\d\d\d)+
1234567890
|
\b #matched here for the first time.
Now here the magic happens. See the engine consumed all characters and the pointer has reached the end of the input with a successfull match. The substitution \1,\2
occures. Now there is no way to retrack the pointer back to
1234567890
|
(\d)
inorder to obtain the expected result
Solution
You havn't mentioned which language you are using. Assuming that the language supports PCRE.
The look aheads will be of great use here.
s/(\d)(?=(\d\d\d)+\b)/\1,/g
Here the second group (?=(\d\d\d)+\b)
is a look ahead and does not consume any characters, but checks if the characters can be matched or not
OR
Using look arounds as
s/(?<=\d)(?=(\d\d\d)+\b)/,/g
Here
(?<=\d)
look behind. Checks if presceded by digits
(?=(\d\d\d)+\b)
look ahead. Checks if followed by 3 digits.
Upvotes: 1