Reputation: 150996
The following works in Ruby for commafication (adding ,
to a number, so 12345
becomes 12,345
)
def r(s)
s.gsub(/(?<=\d)(?=(\d\d\d)+\b)/, ",")
end
s = ""
1.upto(20) do |i|
s += (i % 10).to_s
puts r(s)
end
But I wonder why the variations r2
and r3
won't work?
def r2(s)
s.gsub(/(?<=\d)(?=(\d\d\d)+)\b/, ",")
end
def r3(s)
s.gsub(/(?<=\d)(?=\d\d\d)+\b/, ",")
end
Nothing is modified at all, and I would think that 1234
does match (?<=\d)(?=(\d\d\d)+)\b
so it is a bit strange. (I tried it using Perl as well, so it is not peculiar to Ruby).
Update: The following is the output for r
, while for r2
and r3
, no ,
is added at all:
1
12
123
1,234
12,345
123,456
1,234,567
12,345,678
123,456,789
1,234,567,890
12,345,678,901
123,456,789,012
1,234,567,890,123
12,345,678,901,234
123,456,789,012,345
1,234,567,890,123,456
12,345,678,901,234,567
123,456,789,012,345,678
1,234,567,890,123,456,789
12,345,678,901,234,567,890
Upvotes: 1
Views: 187
Reputation: 25810
Well, in r2, your lookahead is saying that the next three characters must be digits, but then you immediately try to match a word boundary. They are mutually exclusive.
In r3, you are repeating the lookahead one or more times, but, being a lookahead, this is nonsense. You are repeating "the next three characters must be digits" over and over, but they either will be or won't be. Stating it more than once is non-sense. And you still have the problem with the word boundary.
A lookahead is like a peek function on a stack. It doesn't move the pointer forward because it doesn't consume anything. It matches a position (think of that as the space in-between characters). So your lookahead is matching the position where three digits follow. But then the next statement (the \b
) is matching a position where the character on the left is a word character (typically [a-zA-Z0-9_]
or something like that) and the character to the right is not (whitespace, a period, etc.) or vice versa. Since the previous lookbehind requires that there be digit preceding the position, and the lookahead requires a sequence of digits, then it is impossible to ever have a word boundary at the defined position.
Example
The following regex will always fail:
^(?=\d\d\d)\d\d\b
The ^
says that the match must start at the beginning of the input. The lookahead asserts that the next three characters must be digits (but does not consume them). The following expression says that the next two characters must be digits (and consumes them, moving the pointer forward), followed by a non-digit (the word boundary). But this violates the lookahead which required the next three characters must be digits. Thus, the match fails.
See: http://www.regular-expressions.info/lookaround.html
Upvotes: 5
Reputation: 10738
What r2 says in words is "match a word boundary that has a number of digits after it that is divisible by 3 and has a digit before it". This is a contradiction, since no boundary can have digits before AND after it. It would not be a boundary. Therefore, there is nothing this expression can match.
Upvotes: 1
Reputation: 664434
r2
and r3
have the word boundary \b
directly after the lookahead, not inside. This does never match, as you also want to have it preceded by a digit - it's certainly inside a word.
Btw, I'd consider the +
after a lookahead as invalid. If a lookahead matches, it would match repeatedly of course. If you want the repetition of 3 digits, it must be inside the lookahead.
Upvotes: 0