I would like to validate a hostname using only regualr expression. Host Names (or 'labels' in DNS jargon) were traditionally defined by RFC 952 and RFC 1123 and may be composed of the following valid characters. List item A to Z ; upper case characters a to z ; lower case characters 0 to 9 ; numeric characters 0 to 9 - ; dash The rules say: A host name (label) can start or end with a letter or a number A host name (label) MUST NOT start or end with a '-' (dash) A host name (label) MUST NOT consist of all numeric values A host name (label) can be up to 63 characters How would you write Regular Expression to validate hostname ?

Reputation: 418

Regular Expression for validating DNS label ( host name)

I would like to validate a hostname using only regualr expression.

Host Names (or 'labels' in DNS jargon) were traditionally defined by RFC 952 and RFC 1123 and may be composed of the following valid characters.

List item

A to Z ; upper case characters
a to z ; lower case characters
0 to 9 ; numeric characters 0 to 9
- ; dash

The rules say:

A host name (label) can start or end with a letter or a number
A host name (label) MUST NOT start or end with a '-' (dash)
A host name (label) MUST NOT consist of all numeric values
A host name (label) can be up to 63 characters

How would you write Regular Expression to validate hostname ?

Upvotes: 26

Answers (7)

Bill Cole

Reputation: 151

It is worth noting that DNS labels and hostname components have slightly different rules. Most notably: '_' is not legal in any component of a hostname, but is a standard part of labels used for things like SRV records.

A more readable and portable approach is to require a string to match both of these POSIX ERE's:

^([[:alnum:]][[:alnum:]\-]{0,61}[[:alnum:]]|[[:alpha:]])$
^.*[[:^digit:]].*$

Those should be easy to use in any standard-compatible ERE implementation. Perl-style backtracking as in the Python example is widely available, but has the problem of not being exactly the same everywhere that it seems to work. Ouch.

It is possible in principle to make a single ERE of those two lines, but it would be long and unwieldy. The first line handles all of the rules other than the ban on all-digits, the second kills those.

Upvotes: 5

mecampbellsoup

Reputation: 1481

The k8s API responds with the regex that it uses to validate e.g. an RFC 1123-compliant string:

(⎈ minikube:default)➜  cloud-app git:(mc/72-org-ns-names) ✗ k create ns not-valid1234234$%
The Namespace "not-valid1234234$%" is invalid: metadata.name: 
Invalid value: "not-valid1234234$%": a lowercase RFC 1123 label must consist of lower case 
alphanumeric characters or '-', and must start and end with an alphanumeric character 
(e.g. 'my-name',  or '123-abc', regex used for validation is
 '[a-z0-9]([-a-z0-9]*[a-z0-9])?')

Upvotes: 8

Corey Ballou

Reputation: 43457

While the accepted answer is correct, RFC2181 also states under Section 11, "Name Syntax":

The DNS itself places only one restriction on the particular labels that can be used to identify resource records. That one restriction relates to the length of the label and the full name. [...] Implementations of the DNS protocols must not place any restrictions on the labels that can be used. In particular, DNS servers must not refuse to serve a zone because it contains labels that might not be acceptable to some DNS client programs.

This in turn means other characters such as underscores should be allowed.

Upvotes: 3

Dominic Sayers

Reputation: 1793

A revised regex based on comments here and my own reading of RFCs 1035 & 1123:

Ruby: \A(?!-)[a-zA-Z0-9-]{1,63}(?<!-)\z (tests below)

Python: ^(?!-)[a-zA-Z0-9-]{1,63}(?<!-)$ (not tested by me)

Javascript: pattern = /^(?!-)[a-zA-Z0-9-]{1,63}$/g; (based on Tom Lime's answer, not tested by me)

Tests:

tests = [
  ['01010', true],
  ['abc', true],
  ['A0c', true],
  ['A0c-', false],
  ['-A0c', false],
  ['A-0c', true],
  ['o123456701234567012345670123456701234567012345670123456701234567', false],
  ['o12345670123456701234567012345670123456701234567012345670123456', true],
  ['', false],
  ['a', true],
  ['0--0', true],
  ["A0c\nA0c", false]
]

regex = /\A(?!-)[a-zA-Z0-9-]{1,63}(?<!-)\z/
tests.each do |label, expected|
  is_match = !!(regex =~ label)
  puts is_match == expected
end

Notes:

Thanks to Mark Byers for the original code fragment
solidsnack points out that RFC 1123 allows all-numeric labels (https://www.rfc-editor.org/rfc/rfc1123#page-13)
RFC 1035 does not allow zero-length labels (https://www.rfc-editor.org/rfc/rfc1035): <label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]
I've added a test specifically for Ruby that ensures a new line is not embedded in the label. This is thanks to notes by ssorallen.
This code is available here: https://github.com/Xenapto/domain-label-validation - I'm happy to accept pull requests if you want to update it.

Upvotes: 4

Ross Allen

Reputation: 44880

Ruby regular expressions are multiline by default, and so something like Rails warns against using ^ and $. This is Mark's answer with safe start- and end of string characters:

\A(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?<!-)\z

Upvotes: 3

Tom Lime

Reputation: 1204

Javascript regex based on Marks answer:

pattern = /^(?![0-9]+$)(?!.*-$)(?!-)[a-zA-Z0-9-]{1,63}$/g;

Upvotes: 15

Mark Byers

Reputation: 838216

^(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?<!-)$

I used the following testbed written in Python to verify that it works correctly:

tests = [
    ('01010', False),
    ('abc', True),
    ('A0c', True),
    ('A0c-', False),
    ('-A0c', False),
    ('A-0c', True),
    ('o123456701234567012345670123456701234567012345670123456701234567', False),
    ('o12345670123456701234567012345670123456701234567012345670123456', True),
    ('', True),
    ('a', True),
    ('0--0', True),
]

import re
regex = re.compile('^(?![0-9]+$)(?!-)[a-zA-Z0-9-]{,63}(?<!-)$')
for (s, expected) in tests:
    is_match = regex.match(s) is not None
    print is_match == expected

Upvotes: 22

Regular Expression for validating DNS label ( host name)

Answers (7)

Tests:

Notes:

Related Questions