CannibalSmith
CannibalSmith

Reputation: 4820

The Hostname Regex

I'm looking for the regex to validate hostnames. It must completely conform to the standard. Right now, I have

^[0-9a-z]([0-9a-z\-]{0,61}[0-9a-z])?(\.[0-9a-z](0-9a-z\-]{0,61}[0-9a-z])?)*$

but it allows successive hypens and hostnames longer than 255 characters. If the perfect regex is impossible, say so.

Edit/Clarification: a Google search didn't reveal that this is a solved (or proven unsolvable) problem. I want to to create the definitive regex so that nobody has to write his own ever. If dialects matter, I want a a version for each one in which this can be done.

Upvotes: 31

Views: 51807

Answers (7)

KooliMed
KooliMed

Reputation: 41

I tried all answers with these examples below and unfortunately no one has passed the test.

ec2-11-111-222-333.cd-blahblah-1.compute.amazonaws.com
domaine.com
subdomain.domain.com
12533d5.dkkkd.com
2dotsextension.co
1dotextension.c
ekkej_dhh.com
12552.2225
112.25.25
12345.com
12345.123.com
domaine.123
whatever
9999-ee.99
[email protected]
.jjdj.kkd
-subdomain.domain.com
@subdomain.domain.com
112.25.25

Here is a better solution.

^[A-Za-z0-9][A-Za-z0-9-.]*\.\D{2,4}$

Just please post any other not considered case if exists @ https://regex101.com/r/89zZkW/1

Upvotes: 2

jschultz410
jschultz410

Reputation: 2899

According to the relevant internet RFCs and assuming you have lookahead and lookbehind positive and negative assertions:

If you want to validate a local/leaf hostname for use in an internet hostname (e.g. - FQDN), then:

^(?!-)[-a-zA-Z0-9]{1,63}(?<!-)$

That ^^^ is also the general check that a label component inside an internet hostname is valid.

If you want to validate an internet hostname (e.g. - FQDN), then:

^(?=.{1,253}\.?$)(?:(?!-)[-a-zA-Z0-9]{1,63}(?<!-)\.)*(?!-)[-a-zA-Z0-9]{1,63}(?<!-)\.?$

Upvotes: 1

derekm
derekm

Reputation: 459

The approved answer validates invalid hostnames containing multiple dots (example..com). Here is a regex I came up with that I think exactly matches what is allowable under RFC requirements (minus an ending "." supported by some resolvers to short-circuit relative naming and force FQDN resolution).

Spec:

<hname> ::= <name>*["."<name>]
<name> ::= <letter-or-digit>[*[<letter-or-digit-or-hyphen>]<letter-or-digit>]

Regex:

^([a-zA-Z0-9](?:(?:[a-zA-Z0-9-]*|(?<!-)\.(?![-.]))*[a-zA-Z0-9]+)?)$

I've tested quite a few permutations myself, I think it is accurate.

This regex also does not do length validation. Length constraints on labels betweens dots and on names are required by RFC, but lengths can easily be checked as second and third passes after validating against this regex, by checking full string length, and by splitting on "." and validating all substrings lengths. E.g., in JavaScript, label length validation might look like: "example.com".split(".").reduce(function (prev, curr) { return prev && curr.length <= 63; }, true).


Alternative Regex (without negative lookbehind, courtesy of the HTML Living Standard):

^[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$

Upvotes: 14

CannibalSmith
CannibalSmith

Reputation: 4820

^(?=.{1,255}$)[0-9A-Za-z](?:(?:[0-9A-Za-z]|-){0,61}[0-9A-Za-z])?(?:\.[0-9A-Za-z](?:(?:[0-9A-Za-z]|-){0,61}[0-9A-Za-z])?)*\.?$

Upvotes: 32

nbari
nbari

Reputation: 26985

What about:

^(?=.{1,255})([0-9A-Za-z]|_{1}|\*{1}$)(?:(?:[0-9A-Za-z]|\b-){0,61}[0-9A-Za-z])?(?:\.[0-9A-Za-z](?:(?:[0-9A-Za-z]|\b-){0,61}[0-9A-Za-z])?)*\.?$

for matching only one '_' (for some SRV) at the beginning and only one * (in case of a label for a DNs wildcard)

Upvotes: 0

nicerobot
nicerobot

Reputation: 9235

Your answer was relatively close.

But see

For a hostname RE, that perl module produces

(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)

I would modify to be more accurate as:

(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]{0,61})?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]{0,61}[a-zA-Z0-9]|[a-zA-Z])[.]?)

Optionally anchoring the ends with ^$ to ONLY match hostnames.

I don't think a single RE can accomplish an full validation because, according to Wikipedia, there is a 255 character length restriction which i don't think can be included within that same RE, at least not without a ton of changes, but it's easy enough to just check the length <= 255 before running the RE.

Upvotes: 4

JaredPar
JaredPar

Reputation: 755317

Take a look at the following question. A few of the answers have regex expressions for host names

Could you specify what language you want to use this regex in? Most languages / systems have slightly different regex implementations that will affect people's answers.

Upvotes: 1

Related Questions