Reputation: 14970
I have a regular expression for IPv6 addresses as given below
IPV4ADDRESS [ \t]*(([[:digit:]]{1,3}"."){3}([[:digit:]]{1,3}))[ \t]*
x4 ([[:xdigit:]]{1,4})
xseq ({x4}(:{x4}){0,7})
xpart ({xseq}|({xseq}::({xseq}?))|::{xseq})
IPV6ADDRESS [ \t]*({xpart}(":"{IPV4ADDRESS})?)[ \t]*
It is correctly all formats of IPv6 addresses including
1) non-compressed IPv6 addresses
2) compressed IPv6 addresses
3) IPv6 addresses in legacy formats.(supporting IPv4)
Ideal examples of IPv6 addresses in legacy formats would be
2001:1234::3210:5.6.7.8
OR
2001:1234:1234:5432:4578:5678:5.6.7.8
As you can see above there are 10 groups separated by either `":" or ".".`
As opposed to 8 groups in normal IPv6 addresses.This is because the last 4 groups that are separated by `"." should be compressed into least significant 32-bits of the IPv6 addresses.Hence we need 10 groups to satisfy 128 bits.
However If I use the following address format
2001:1234:4563:3210:5.6.7.8
Here each group separated by ":" represents 16-bits.the last four groups separted by "." represents 8 bits.Total number of bits is 64 + 32 = 96 bits.32 bits are missing
The regular expression is accepting it as a valid IPv6 address format.I am unable to figure out how to fix the regular expression to discard such values.Any help is highly appreciated.
Upvotes: 4
Views: 2210
Reputation: 179412
Here's the grammar for IPv6 addresses as given in RFC 3986 and subsequently affirmed in RFC 5954:
IPv6address = 6( h16 ":" ) ls32
/ "::" 5( h16 ":" ) ls32
/ [ h16 ] "::" 4( h16 ":" ) ls32
/ [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
/ [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
/ [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32
/ [ *4( h16 ":" ) h16 ] "::" ls32
/ [ *5( h16 ":" ) h16 ] "::" h16
/ [ *6( h16 ":" ) h16 ] "::"
h16 = 1*4HEXDIG
ls32 = ( h16 ":" h16 ) / IPv4address
IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
Using this, we can build a standards-compliant regular expression for IPv6 addresses.
dec_octet ([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])
ipv4address ({dec_octet}"."){3}{dec_octet}
h16 ([[:xdigit:]]{1,4})
ls32 ({h16}:{h16}|{ipv4address})
ipv6address (({h16}:){6}{ls32}|::({h16}:){5}{ls32}|({h16})?::({h16}:){4}{ls32}|(({h16}:){0,1}{h16})?::({h16}:){3}{ls32}|(({h16}:){0,2}{h16})?::({h16}:){2}{ls32}|(({h16}:){0,3}{h16})?::{h16}:{ls32}|(({h16}:){0,4}{h16})?::{ls32}|(({h16}:){0,5}{h16})?::{h16}|(({h16}:){0,6}{h16})?::)
Disclaimer: untested.
Upvotes: 5