DonGar
DonGar

Reputation: 7644

Regular expression to match DNS hostname or IP Address?

Does anyone have a regular expression handy that will match any legal DNS hostname or IP address?

It's easy to write one that works 95% of the time, but I'm hoping to get something that's well tested to exactly match the latest RFC specs for DNS hostnames.

Upvotes: 416

Views: 452611

Answers (22)

dbaker
dbaker

Reputation: 309

There's a further nuance here that's missing.

It's true that a HOSTNAME should match, basically, what's been given above.

What's missing is that a REFERENCE TO a hostname can be the same, plus an optional period on the end.

For example, with a trailing period, ping foo.bar.svc.cluster.local. will ping that hostname only, and not attempt any DNS search options in resolv.conf.

tldr - If you provide an input box to receive a hostname, what's entered does not actually need to be a valid hostname.

Upvotes: 0

user2240578
user2240578

Reputation: 216

/^(?:[a-zA-Z0-9]+|[a-zA-Z0-9][-a-zA-Z0-9]+[a-zA-Z0-9])(?:\.[a-zA-Z0-9]+|[a-zA-Z0-9][-a-zA-Z0-9]+[a-zA-Z0-9])?$/

Upvotes: 1

Darrell Root
Darrell Root

Reputation: 844

The new Network framework has failable initializers for struct IPv4Address and struct IPv6Address which handle the IP address portion very easily. Doing this in IPv6 with a regex is tough with all the shortening rules.

Unfortunately I don't have an elegant answer for hostname.

Note that Network framework is recent, so it may force you to compile for recent OS versions.

import Network
let tests = ["192.168.4.4","fkjhwojfw","192.168.4.4.4","2620:3","2620::33"]

for test in tests {
    if let _ = IPv4Address(test) {
        debugPrint("\(test) is valid ipv4 address")
    } else if let _ = IPv6Address(test) {
        debugPrint("\(test) is valid ipv6 address")
    } else {
        debugPrint("\(test) is not a valid IP address")
    }
}

output:
"192.168.4.4 is valid ipv4 address"
"fkjhwojfw is not a valid IP address"
"192.168.4.4.4 is not a valid IP address"
"2620:3 is not a valid IP address"
"2620::33 is valid ipv6 address"

Upvotes: 1

abarnert
abarnert

Reputation: 365597

It's worth noting that there are libraries for most languages that do this for you, often built into the standard library. And those libraries are likely to get updated a lot more often than code that you copied off a Stack Overflow answer four years ago and forgot about. And of course they'll also generally parse the address into some usable form, rather than just giving you a match with a bunch of groups.

For example, detecting and parsing IPv4 in (POSIX) C:

#include <arpa/inet.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
  for (int i=1; i!=argc; ++i) {
    struct in_addr addr = {0};
    printf("%s: ", argv[i]);
    if (inet_pton(AF_INET, argv[i], &addr) != 1)
      printf("invalid\n");
    else
      printf("%u\n", addr.s_addr);
  }
  return 0;
}

Obviously, such functions won't work if you're trying to, e.g., find all valid addresses in a chat message—but even there, it may be easier to use a simple but overzealous regex to find potential matches, and then use the library to parse them.

For example, in Python:

>>> import ipaddress
>>> import re
>>> msg = "My address is 192.168.0.42; 192.168.0.420 is not an address"
>>> for maybeip in re.findall(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', msg):
...     try:
...         print(ipaddress.ip_address(maybeip))
...     except ValueError:
...         pass

Upvotes: 7

PythonDev
PythonDev

Reputation: 4317

def isValidHostname(hostname):

    if len(hostname) > 255:
        return False
    if hostname[-1:] == ".":
        hostname = hostname[:-1]   # strip exactly one dot from the right,
                                   #  if present
    allowed = re.compile("(?!-)[A-Z\d-]{1,63}(?<!-)$", re.IGNORECASE)
    return all(allowed.match(x) for x in hostname.split("."))

Upvotes: 2

Alban
Alban

Reputation: 3215

To match a valid IP address use the following regex:

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}

instead of:

([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])(\.([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])){3}

Explanation

Many regex engine match the first possibility in the OR sequence. For instance, try the following regex:

10.48.0.200

Test

Test the difference between good vs bad

Upvotes: 36

Mohammad Shahid Siddiqui
Mohammad Shahid Siddiqui

Reputation: 4160

>>> my_hostname = "testhostn.ame"
>>> print bool(re.match("^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$", my_hostname))
True
>>> my_hostname = "testhostn....ame"
>>> print bool(re.match("^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$", my_hostname))
False
>>> my_hostname = "testhostn.A.ame"
>>> print bool(re.match("^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$", my_hostname))
True

Upvotes: 1

sirjay
sirjay

Reputation: 1766

on php: filter_var(gethostbyname($dns), FILTER_VALIDATE_IP) == true ? 'ip' : 'not ip'

Upvotes: -1

Dody
Dody

Reputation: 307

I thought about this simple regex matching pattern for IP address matching \d+[.]\d+[.]\d+[.]\d+

Upvotes: -2

Thom Anderson
Thom Anderson

Reputation: 11

Regarding IP addresses, it appears that there is some debate on whether to include leading zeros. It was once the common practice and is generally accepted, so I would argue that they should be flagged as valid regardless of the current preference. There is also some ambiguity over whether text before and after the string should be validated and, again, I think it should. 1.2.3.4 is a valid IP but 1.2.3.4.5 is not and neither the 1.2.3.4 portion nor the 2.3.4.5 portion should result in a match. Some of the concerns can be handled with this expression:

grep -E '(^|[^[:alnum:]+)(([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])\.){3}([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])([^[:alnum:]]|$)' 

The unfortunate part here is the fact that the regex portion that validates an octet is repeated as is true in many offered solutions. Although this is better than for instances of the pattern, the repetition can be eliminated entirely if subroutines are supported in the regex being used. The next example enables those functions with the -P switch of grep and also takes advantage of lookahead and lookbehind functionality. (The function name I selected is 'o' for octet. I could have used 'octet' as the name but wanted to be terse.)

grep -P '(?<![\d\w\.])(?<o>([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(\.\g<o>){3}(?![\d\w\.])'

The handling of the dot might actually create a false negatives if IP addresses are in a file with text in the form of sentences since the a period could follow without it being part of the dotted notation. A variant of the above would fix that:

grep -P '(?<![\d\w\.])(?<x>([0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(\.\g<x>){3}(?!([\d\w]|\.\d))'

Upvotes: 0

seraphim
seraphim

Reputation: 9

try this:

((2[0-4]\d|25[0-5]|[01]?\d\d?)\.){3}(2[0-4]\d|25[0-5]|[01]?\d\d?)

it works in my case.

Upvotes: 0

aliasav
aliasav

Reputation: 3168

This works for valid IP addresses:

regex = '^([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])[.]([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])[.]([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])[.]([0-9]|[1-9][0-9]|[1][0-9][0-9]|[2][0-5][0-5])$'

Upvotes: 1

ayu for u
ayu for u

Reputation: 207

AddressRegex = "^(ftp|http|https):\/\/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,5})$";

HostnameRegex =  /^(ftp|http|https):\/\/([a-z0-9]+\.)?[a-z0-9][a-z0-9-]*((\.[a-z]{2,6})|(\.[a-z]{2,6})(\.[a-z]{2,6}))$/i

this re are used only for for this type validation

work only if http://www.kk.com http://www.kk.co.in

not works for

http://www.kk.com/ http://www.kk.co.in.kk

http://www.kk.com/dfas http://www.kk.co.in/

Upvotes: 0

zangw
zangw

Reputation: 48346

"^((\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])\.){3}(\\d{1,2}|1\\d{2}|2[0-4]\\d|25[0-5])$"

Upvotes: 1

Andrew
Andrew

Reputation: 203

I found this works pretty well for IP addresses. It validates like the top answer but it also makes sure the ip is isolated so no text or more numbers/decimals are after or before the ip.

(?<!\S)(?:(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\b|.\b){7}(?!\S)

Upvotes: 0

Saikrishna Rao
Saikrishna Rao

Reputation: 615

how about this?

([0-9]{1,3}\.){3}[0-9]{1,3}

Upvotes: -1

Jorge Ferreira
Jorge Ferreira

Reputation: 97839

You can use the following regular expressions separately or by combining them in a joint OR expression.

ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";

ValidHostnameRegex = "^(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9])$";

ValidIpAddressRegex matches valid IP addresses and ValidHostnameRegex valid host names. Depending on the language you use \ could have to be escaped with \.


ValidHostnameRegex is valid as per RFC 1123. Originally, RFC 952 specified that hostname segments could not start with a digit.

http://en.wikipedia.org/wiki/Hostname

The original specification of hostnames in RFC 952, mandated that labels could not start with a digit or with a hyphen, and must not end with a hyphen. However, a subsequent specification (RFC 1123) permitted hostname labels to start with digits.

Valid952HostnameRegex = "^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";

Upvotes: 596

Thangaraj
Thangaraj

Reputation: 73

Checking for host names like... mywebsite.co.in, thangaraj.name, 18thangaraj.in, thangaraj106.in etc.,

[a-z\d+].*?\\.\w{2,4}$

Upvotes: -2

Prakash Thapa
Prakash Thapa

Reputation: 1285

I think this is the best Ip validation regex. please check it once!!!

^(([01]?[0-9]?[0-9]|2([0-4][0-9]|5[0-5]))\.){3}([01]?[0-9]?[0-9]|2([0-4][0-9]|5[0-5]))$

Upvotes: 2

Sakari A. Maaranen
Sakari A. Maaranen

Reputation: 791

The hostname regex of smink does not observe the limitation on the length of individual labels within a hostname. Each label within a valid hostname may be no more than 63 octets long.

ValidHostnameRegex="^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])\
(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$"

Note that the backslash at the end of the first line (above) is Unix shell syntax for splitting the long line. It's not a part of the regular expression itself.

Here's just the regular expression alone on a single line:

^([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])(\.([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]{0,61}[a-zA-Z0-9]))*$

You should also check separately that the total length of the hostname must not exceed 255 characters. For more information, please consult RFC-952 and RFC-1123.

Upvotes: 79

Alex Volkov
Alex Volkov

Reputation: 2962

I don't seem to be able to edit the top post, so I'll add my answer here.

For hostname - easy answer, on egrep example here -- http: //www.linuxinsight.com/how_to_grep_for_ip_addresses_using_the_gnu_egrep_utility.html

egrep '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}'

Though the case doesn't account for values like 0 in the fist octet, and values greater than 254 (ip addres) or 255 (netmask). Maybe an additional if statement would help.

As for legal dns hostname, provided that you are checking for internet hostnames only (and not intranet), I wrote the following snipped, a mix of shell/php but it should be applicable as any regular expression.

first go to ietf website, download and parse a list of legal level 1 domain names:

tld=$(curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt |  sed 1d  | cut -f1 -d'-' | tr '\n' '|' | sed 's/\(.*\)./\1/')
echo "($tld)"

That should give you a nice piece of re code that checks for legality of top domain name, like .com .org or .ca

Then add first part of the expression according to guidelines found here -- http: //www.domainit.com/support/faq.mhtml?category=Domain_FAQ&question=9 (any alphanumeric combination and '-' symbol, dash should not be in the beginning or end of an octet.

(([a-z0-9]+|([a-z0-9]+[-]+[a-z0-9]+))[.])+

Then put it all together (PHP preg_match example):

$pattern = '/^(([a-z0-9]+|([a-z0-9]+[-]+[a-z0-9]+))[.])+(AC|AD|AE|AERO|AF|AG|AI|AL|AM|AN|AO|AQ|AR|ARPA|AS|ASIA|AT|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BIZ|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CAT|CC|CD|CF|CG|CH|CI|CK|CL|CM|CN|CO|COM|COOP|CR|CU|CV|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EDU|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|GF|GG|GH|GI|GL|GM|GN|GOV|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|INFO|INT|IO|IQ|IR|IS|IT|JE|JM|JO|JOBS|JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MIL|MK|ML|MM|MN|MO|MOBI|MP|MQ|MR|MS|MT|MU|MUSEUM|MV|MW|MX|MY|MZ|NA|NAME|NC|NE|NET|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|ORG|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|PRO|PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SY|SZ|TC|TD|TEL|TF|TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TRAVEL|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|XN|XN|XN|XN|XN|XN|XN|XN|XN|XN|XN|YE|YT|YU|ZA|ZM|ZW)[.]?$/i';

    if (preg_match, $pattern, $matching_string){
    ... do stuff
    }

You may also want to add an if statement to check that string that you checking is shorter than 256 characters -- http://www.ops.ietf.org/lists/namedroppers/namedroppers.2003/msg00964.html

Upvotes: 6

Bill Stephens
Bill Stephens

Reputation: 1

Here is a regex that I used in Ant to obtain a proxy host IP or hostname out of ANT_OPTS. This was used to obtain the proxy IP so that I could run an Ant "isreachable" test before configuring a proxy for a forked JVM.

^.*-Dhttp\.proxyHost=(\w{1,}\.\w{1,}\.\w{1,}\.*\w{0,})\s.*$

Upvotes: 0

Related Questions