imnotneo
imnotneo

Reputation: 729

How can I match IPv6 addresses with a Perl regex?

So I need to match an ipv6 address which may or may not have a mask. Unfortunately I can't just use a library to parse the string.

The mask bit is easy enough, in this case:

(?:\/\d{1,3})?$/

The hard part is the different formats of an ipv6 address. It needs to match ::beef, beef::, beef::beef, etc.

An update: I'm almost there..

/^(\:\:([a-f0-9]{1,4}\:){0,6}?[a-f0-9]{0,4}|[a-f0-9]{1,4}(\:[a-f0-9]{1,4}){0,6}?\:\:|[a-f0-9]{1,4}(\:[a-f0-9]{1,4}){1,6}?\:\:([a-f0-9]{1,4}\:){1,6}?[a-f0-9]{1,4})(\/\d{1,3})?$/i

I am, in this case restricted to using perl's regex.

Upvotes: 6

Views: 11867

Answers (9)

Ron Maupin
Ron Maupin

Reputation: 6452

This is a comprehensive IPv6 regular expression that tests all the valid IPv6 text notations (expanded, compressed, expanded-mixed, compressed-mixed) with an optional prefix length. It will also capture the various parts into capture groups. You can skip the capture groups by putting a ?: right after the opening paren for a capture group.

This is the regular expression I created and use in my IPvX IP calculator for both IPv4 and IPv6.

^# Anchor
  (# BEGIN Compressed-mixed                                         *** Group 1 ***
    (# BEGIN Hexadecimal Notation                                   *** Group 2 ***
       (?:
         (?:[0-9A-F]{1,4}:){5}[0-9A-F]{1,4}            # No ::
       | (?:[0-9A-F]{1,4}:){4}:[0-9A-F]{1,4}           # 4::1
       | (?:[0-9A-F]{1,4}:){3}(?::[0-9A-F]{1,4}){1,2}  # 3::2
       | (?:[0-9A-F]{1,4}:){2}(?::[0-9A-F]{1,4}){1,3}  # 2::3
       | [0-9A-F]{1,4}:(?::[0-9A-F]{1,4}){1,4}         # 1::4
       | (?:[0-9A-F]{1,4}:){1,5}                       # :: End
       | :(?::[0-9A-F]{1,4}){1,5}                      # :: Start
       | :                                             # :: Only
       ):
    )# END Hexadecimal Notation
    (# BEGIN Dotted-decimal Notation                                *** Group 3 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 4 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 5 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 6 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])   # 0 to 255   *** Group 7 ***
    )# END Dotted-decimal Notation
  )# END Compressed-mixed
  |
  (# BEGIN Compressed                                               *** Group 8 ***
     (?:# BEGIN Hexadecimal Notation
       (?:[0-9A-F]{1,4}:){7}[0-9A-F]{1,4}              # No ::
     | (?:[0-9A-F]{1,4}:){6}:[0-9A-F]{1,4}             # 6::1
     | (?:[0-9A-F]{1,4}:){5}(?::[0-9A-F]{1,4}){1,2}    # 5::2
     | (?:[0-9A-F]{1,4}:){4}(?::[0-9A-F]{1,4}){1,3}    # 4::3
     | (?:[0-9A-F]{1,4}:){3}(?::[0-9A-F]{1,4}){1,4}    # 3::4
     | (?:[0-9A-F]{1,4}:){2}(?::[0-9A-F]{1,4}){1,5}    # 2::5
     | [0-9A-F]{1,4}:(?::[0-9A-F]{1,4}){1,6}           # 1::6
     | (?:[0-9A-F]{1,4}:){1,7}:                        # :: End
     | :(?::[0-9A-F]{1,4}){1,7}                        # :: Start
     | ::                                              # :: Only
     )  # END Hexadecimal Notation
  )# END Compressed
  (?:# BEGIN Optional Length
       /(12[0-8]|1[0-1][0-9]|[1-9]?[0-9])              # /0 to /128 *** Group 9 ***
  )? # END Optional Length
$# Anchor

Bonus IPv4 regular expression:

^# Anchor
  (?:# BEGIN Dotted-decimal Notation
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 1 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 2 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\. # 0 to 255.  *** Group 3 ***
       (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])   # 0 to 255   *** Group 4 ***
  )  # END Dotted-decimal Notation
  (?:# BEGIN Optional Length
       /(3[0-2]|[1-2]?[0-9])                           # /0 to /32  *** Group 5 ***
  )? # END Optional Length
$# Anchor

Upvotes: 0

Oleg Kokorin
Oleg Kokorin

Reputation: 2672

here is the one worked for all the examples of IPv6 I've managed to find:

/^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(\.(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))(%.+)?\s*$/

make sure it's the one line before using. it's been found here:

https://community.helpsystems.com/forums/intermapper/miscellaneous-topics/5acc4fcf-fa83-e511-80cf-0050568460e4

verified on all examples from the question page, the community page and the wikipedia site from here:

https://en.wikipedia.org/wiki/IPv6

the tool for the verification being used the one from here:

https://regex101.com/

Upvotes: 1

Aeron
Aeron

Reputation: 21

Try:

/^(((?=(?>.*?(::))(?!.+\3)))\3?|([\dA-F]{1,4}(\3|:(?!$)|$)|\2))(?4){5}((?4){2}|((2[0-4]|1\d|[1-9])?\d|25[0-5])(\.(?7)){3})\z/ai

From: http://home.deds.nl/~aeron/regex/

Upvotes: 2

Swati
Swati

Reputation: 21

This mostly works...

^([0-9a-fA-F]{0,4}|0)(\:([0-9a-fA-F]{0,4}|0)){7}$

Cons: :: like cases not handled correctly

Upvotes: 2

Ivo
Ivo

Reputation: 11

If you need in perl check if a string is an IPv6 address you can try this:

if (/(([\da-f]{0,4}:{0,2}){1,8})/i) { print("$1") };

Upvotes: 1

Schwern
Schwern

Reputation: 164639

This contains a patch to Regexp::Common demonstrating a complete, accurate, tested IPv6 regex. Its a straight translation of the IPv6 grammar. Regexp::IPv6 is also accurate.

More importantly, it contains a test suite. Running it with your regex shows you're still a ways off. 10 out of 19 missed. 1 out of 12 false positives. IPv6 contains a lot of special shorthands making it very easy to get subtly wrong.

Best place to read up on what goes into an IPv6 address is RFC 3986 section 3.2.2.

Upvotes: 13

innaM
innaM

Reputation: 47829

What do you mean you can't just use a library? How about a module? Regexp::IPv6 will give you what you need.

Upvotes: 10

tsee
tsee

Reputation: 5072

I'm not an IPv6 expert, but please trust me when I tell you that matching (let alone validating) IPv6 addresses is not easy with a very simple regex such as the one you suggest. There's many shorthands and various conventions for combining the address with a port, just to name an example. One such shorthand is that you can write 0:0:0:0:0:0:0:1 as ::1, but there's more. If you read German, I would suggest looking at the slides of Steffen Ullrich's talk at the 11th German Perl Workshop.

You say you can't use a library, but if you're going to reinvent the whole complexity of the library, then you could as well just import it verbatim into your project.

Upvotes: 5

Rubens Farias
Rubens Farias

Reputation: 57936

Try this:

^([0-9a-fA-F]{4}|0)(\:([0-9a-fA-F]{4}|0)){7}$

From Regular Expression Library: IPv6 address

You should also read this: A Regular Expression for IPv6 Addresses

Upvotes: 1

Related Questions