krackmoe
krackmoe

Reputation: 1763

Regex pipe after square brackets

I found a regex which i quite dont understand.

It looks like this:

([|)\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b(]|)

I do understand that it tries to match against some digits like 255.255 and that it should be a complete word.

But what are the "([|)" "(]|)" for? The square bracket and the pipe in the last one is also in the wrong order as it seems.

Upvotes: 0

Views: 982

Answers (2)

Ulugbek Umirov
Ulugbek Umirov

Reputation: 12797

The purpose of regex is unclear. Debuggex makes nice visualization.

Regular expression visualization

Debuggex Demo

The part about 0~255 is clear (000, 00 are also accepted values). But there is unknown reason for trying to match |)([] symbols.

I believe first [ and last ] appear because of error. Without them the internal regex looks reasonable. But (|) and \b also don't look right, so my guess is that we can omit (|) too.

(|)\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b(|)

Regular expression visualization

Debuggex Demo

Upvotes: 1

zx81
zx81

Reputation: 41838

krackmoe, interestingly, there is no ([|): it is an optical illusion.

The regex engine does not see ([|)

It sees ( which opens capture Group 1, then it sees a character class [|)\b(25[0-5] which does not make a whole lot of sense for several reasons. For instance, \b matches the literal character "b", and the characters 2 and 5 are redundant with the range 0-5.

So you are quite right not to understand it.

I presume the author wanted to put a word boundary there, but as it stands, it is a typo.

For reference, here is a token-by-token explanation of the regex. (Don't worry, I didn't type all that, it was automatically generated by RegexBuddy.)

* Match the regex below and capture its match into backreference number 1 `([|)\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)`
    * Match this alternative (attempting the next alternative only if this one fails) `[|)\b(25[0-5]`
        * Match a single character present in the list below `[|)\b(25[0-5]`
            * A single character from the list “|)” `|)`
            * The character `\b`
            * A single character from the list “(25[” `(25[`
            * A character in the range between “0” and “5” `0-5`
    * Or match this alternative (attempting the next alternative only if this one fails) `2[0-4][0-9]`
        * Match the character “2” literally `2`
        * Match a single character in the range between “0” and “4” `[0-4]`
        * Match a single character in the range between “0” and “9” `[0-9]`
    * Or match this alternative (the entire group fails if this one fails to match) `[01]?[0-9][0-9]?`
        * Match a single character from the list “01” `[01]?`
            * Between zero and one times, as many times as possible, giving back as needed (greedy) `?`
        * Match a single character in the range between “0” and “9” `[0-9]`
        * Match a single character in the range between “0” and “9” `[0-9]?`
            * Between zero and one times, as many times as possible, giving back as needed (greedy) `?`
* Match the character “.” literally `\.`
* Match the regex below and capture its match into backreference number 2 `(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)`
    * Match this alternative (attempting the next alternative only if this one fails) `25[0-5]`
        * Match the character string “25” literally `25`
        * Match a single character in the range between “0” and “5” `[0-5]`
    * Or match this alternative (attempting the next alternative only if this one fails) `2[0-4][0-9]`
        * Match the character “2” literally `2`
        * Match a single character in the range between “0” and “4” `[0-4]`
        * Match a single character in the range between “0” and “9” `[0-9]`
    * Or match this alternative (the entire group fails if this one fails to match) `[01]?[0-9][0-9]?`
        * Match a single character from the list “01” `[01]?`
            * Between zero and one times, as many times as possible, giving back as needed (greedy) `?`
        * Match a single character in the range between “0” and “9” `[0-9]`
        * Match a single character in the range between “0” and “9” `[0-9]?`
            * Between zero and one times, as many times as possible, giving back as needed (greedy) `?`
* Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) `\b`
* Match the regex below and capture its match into backreference number 3 `(]|)`
    * Match this alternative (attempting the next alternative only if this one fails) `]`
        * Match the character “]” literally `]`
    * Or match this alternative (the entire group fails if this one fails to match)

Upvotes: 1

Related Questions