Andy
Andy

Reputation: 6568

lookaround explanation needed when adding commas to a number

I would gladly appreciate some help in understanding what's going on in the code below. It's just not clicking. So, the snippet (taken from this book)

s/(?<=\d)(?=(\d\d\d)+$)/,/g

converts a number 123456789 to 123,456,789. (g is the global flag). Now, say we have the number 1234. From my understanding (?<=\d) will place us in front of 1 like so 1|234. Then, (?=(\d\d\d)+$) picks up where the look behind left off and evaluate the remaining digits. Since 234 matches the pattern (3 digit and one end line), our substitution takes place (1,234). I hope I got this right.

Now, I'm confused when I make my numbers bigger say 1234567. When I put this into a regex tester I get 1|234|567 but in my mind I expected 1234|567. So...why ? Why does the look ahead for 234 evaluate to true when 4 is not terminated by an end line ? Does this have anything to do with the global flag? Thanks.

Upvotes: 3

Views: 56

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336128

The lookahead looks for multiples of three digits: (\d\d\d)+ matches 3, 6, 9, ... digits, and therefore it matches before 234567.

And yes, the global flag has to do with the regex matching twice (although without it, as you can easily test, the result would be 1|234567).

Let's see what happens when we go through the string "1234567":

1.  1234567
   ^ (?<=\d) doesn't match - regex fails.
2. 1 234567
    ^ (?<=\d) matches "1", (?=(\d\d\d)+$) matches "234567"! MATCH!
3. 12 34567
     ^ (?<=\d) matches "2", (?=(\d\d\d)+$) doesn't match.
4. 123 4567
      ^ (?<=\d) matches "3", (?=(\d\d\d)+$) doesn't match.
5. 1234 567
       ^ (?<=\d) matches "4", (?=(\d\d\d)+$) matches "567"! MATCH!
6. 12345 67
        ^ (?<=\d) matches "5", (?=(\d\d\d)+$) doesn't match.
7. 123456 7
         ^ (?<=\d) matches "6", (?=(\d\d\d)+$) doesn't match.
8. 1234567
          ^ (?<=\d) matches "7", (?=(\d\d\d)+$) doesn't match.

Upvotes: 3

Related Questions