kevin
kevin

Reputation: 785

Regex: Comma Delimiting large integers (e.g. 2903 -> 2,903)

Here is the text:

1234567890

The regular expression:

s/(\d)((\d\d\d)+\b)/\1,\2/g

The expected result:

1,234,567,890

The actual result:

1,234567890

This is an example used to add a comma per 3 digits from right to left from mastering regular expression. Here is the explaination:

This is because the digits matched by (\d\d\d)+ are now actually part of the final match, and so are not left "unmatched" and available to the next iteration of the regex via the /g.

But I still don't understand it and I hope anybody could help me to figure it out detailly. Thanks in advance.

Upvotes: 0

Views: 41

Answers (1)

nu11p01n73R
nu11p01n73R

Reputation: 26667

Prerequisite

The regex engine will match each character from left to right. And the matched characters are consumed by the engine. That is once consumed you cannot go back reconsume those characters again.

How does the match occure for (\d)((\d\d\d)+\b)

  1234567890
  |
(\d)

  1234567890
   |||
  (\d\d\d)+

  1234567890
      |
     \b #cannot be matched, hence it goes for another `(\d\d\d)+`

  1234567890
      |||
    (\d\d\d)+

  1234567890
         |
        \b #cannot be matched, hence it goes for another `(\d\d\d)+`

  1234567890
         |||
      (\d\d\d)+

  1234567890
            |
           \b #matched here for the first time.

Now here the magic happens. See the engine consumed all characters and the pointer has reached the end of the input with a successfull match. The substitution \1,\2 occures. Now there is no way to retrack the pointer back to

 1234567890
    |
   (\d)

inorder to obtain the expected result

Solution

You havn't mentioned which language you are using. Assuming that the language supports PCRE.

The look aheads will be of great use here.

s/(\d)(?=(\d\d\d)+\b)/\1,/g

Here the second group (?=(\d\d\d)+\b) is a look ahead and does not consume any characters, but checks if the characters can be matched or not

Regex Demo

OR

Using look arounds as

s/(?<=\d)(?=(\d\d\d)+\b)/,/g

Here

  • (?<=\d) look behind. Checks if presceded by digits

  • (?=(\d\d\d)+\b) look ahead. Checks if followed by 3 digits.

Regex Demo

Upvotes: 1

Related Questions