Reputation: 131
I want to create a script that will add new domains to our DNS Servers. I found that Fully qualified domain name validation REGEX. However, when I use it with sed, it is not working as I would expect:
echo test | sed '/(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(:[a-zA-Z]{2,})$)/p'
--------
Output is:
test
echo test.com | sed '/(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(:[a-zA-Z]{2,})$)/p'
--------
Output is:
test.com
I expected that the output of the first command should be a blank line. What do I do wrong?
Upvotes: 2
Views: 14106
Reputation: 1118
I find this to be a more comprehensive regex:
(?=^.{4,253}$)(^(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){0,61}[a-zA-Z0-9])?\.)+([a-zA-Z]{2,}|xn--[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])$)
(?=^.{4,253}$)
(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){,61}[a-zA-Z0-9])?\.)
([a-zA-Z]{2,}|xn--[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])
RFC 3696§2: The DNS spec technically permits numerics in the TLD, as well as single-letter TLDs; however, there are currently no single-letter TLDs or TLDs with numbers currently, and all-numeric TLDs are not permitted, so this part of the regex has been simplified to [a-zA-Z]{2,}
.
--OR--
RFC 3490§5: an internationalized domain name ccTLD (IDN ccTLD) may be punycoded, as indicated by an "xn--" prefix, after which it may contain letters, numbers, or hyphens. This approximates to xn--[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9]
Be aware that this pattern does not validate a punycode TLD! Invalid punycode will be tolerated, e.g. "xn--qqqq", because attempting to validate punycode against the appropriate encoding mechanisms is beyond the scope of a regular expression. While punycode itself technically permits an encoded string ending in a hyphen, RFC 3492§5 observes and respects the IDNA limitation that labels may not end in a hyphen.
EDIT 02/2021: Hat tip to user2241415 for pointing out that IDN ccTLDs did not match the previously-specified regex.
Upvotes: 14
Reputation: 1563
if the domain has to exist you can try:
$ cat test.sh
#!/bin/bash
for h in "bert" "ernie" "www.google.com"
do
host $h 2>&1 > /dev/null
if [ $? -eq 0 ]
then
echo "$h is a FQDN"
else
echo "$h is not a FQDN"
fi
done
jalderman@mba:/tmp$ ./test.sh
bert is not a FQDN
ernie is not a FQDN
www.google.com is a FQDN
Upvotes: -2
Reputation: 7588
I use grep -P
to do this.
echo test | grep -P "^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$"
--------
Output is:
echo www.test.com | grep -P "^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$"
--------
Output is: www.test.com
Upvotes: 0
Reputation: 189307
No sed
implementation I am aware of supports the various Perl extensions you are using in that regex. Try with Perl or grep -P
or pcregrep
, or simplify the regex to something sed
can cope with. Here is a quick and dirty adaptation which splits the regex into a script of three different regexes, and rejects when something fails to match (or matches, in the middlemost case).
echo 'test' | sed -r '/^.{5,254}$/!d
/^([^.]*\.)*[0-9]+\./d # Seems incorrect; 112.com is valid
/^([a-zA-Z0-9_\-]{1,63}\.?)+([a-zA-Z]{2,})$/!d' # should disallow underscore
# also, what's with the question mark after the literal dot?
This also completely fails to accept IDNA domains (which can contain dashes and numbers in the TLD, among other things) so I would definitely not recommend this, but hopefully it shows you how to adapt something like this to sed
if you wish to.
Upvotes: 1
Reputation: 171
Pierre-Louis' answer didn't quite work for me. e.g. "kittens" is considered a domain name. I added one slight adjustment to ensure that the domain at least had a dot in it.
(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+\.(?:[a-z]{2,})$)
Theres an extra \.
just before it reads the last portion of the domain.
Upvotes: 0
Reputation: 1478
You are missing a question mark in your regex :
(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(
?:[a-zA-Z]{2,})$)
You can test your regex here
You can do what you want with grep :
$ echo test.com | grep -P '(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)'
test.com
$ echo test | grep -P '(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)'
$
Upvotes: 3