Reputation: 763
Trying to learn Regex subroutines. I made this Regex to match IP addresses. It works in Notepad++, but when I tried it in a PCRE tester in the net it only matches IPs with at most 2 figures in the last group. Can you help me to understand why?
\b((\d{1,2}|[01]\d{2}|2[0-4]\d|25[0-5])\.){3}(?2)\b
In the example " 192.168.0.219 192.168.0.21 " in NPP I have 2 matches, while PCRE (regex101.com) matches only the second address.
Upvotes: 4
Views: 122
Reputation: 5308
Notepad++ uses boost for regex. See here: Which regex engine does Notepad++ use?. So that may explain the difference.
The problem is this piece \d{1,2}
, that will not work as you expect with recursion (on PCRE). On the non-recursive case, you are forced to find a dot after the number.
But since the recursion targets group 2, you 'enter' on the recursion pattern, you find \d{1,2}
(21
from 219
) and end the recursion. Then, when exiting you are expected to find \b
, but you don't (you find a 9
), so you fail.
Perhaps the boost engine considers the whole expression before entering recursion. Or perhaps It has a different backtracking system that allows to backtrack recursion and re-valuate the recursion again for some other of the group of options. In the end, different implementations cause different results.
To make both thigs work the same, you can use this:
\b(([01]\d{2}|2[0-4]\d|25[0-5]|\d{1,2})\.){3}(?2)\b
That is, you place the \d{1,2}
as the last option.
In general, It is a good practice to sort groups of options (like, let's say (aaa|aa|a)
) so that the longest patterns occur first (if some overlapping is possible)
As an alternative, if you wanted to keep the same order on the group, you could use:
\b((\d{1,2}(?!\d)|[01]\d{2}|2[0-4]\d|25[0-5])\.){3}(?2)\b
(We add a negative look behind for the \d{1,2}
There must not be a number after that)
Upvotes: 3