Marco Bonelli
Marco Bonelli

Reputation: 69346

RegExp to match comma separated hostnames

The problem

I'm trying to validate the content of a <textarea> using JavaScript, So I created a validate() function, which returns true or false wheter the text inside the textarea is valid or not.

The textarea can only contain comma separated hostnames. By hostname I mean something like subdomain.domain.com, so it's basically some dot separated strings. Since that users don't tend to write very well, I also want to allow the possibility of leaving any amount of spaces between the various hostnames and commas, but not inside a hostname.

Here are some examples of what should or shouldn't match:

What I have tried so far

I built my function using the following code:

function validate(text) {
    return (
        (/^([a-z0-9\-\.]+ *, *)*[a-z0-9\-\.]+[^, ]$/i.test(text) 
        && !/\.[^a-z]|\.$/i.test(text)
        && ~text.indexOf('.'))
    );
}

unfortunately, my function just doesn't work. It fails to recognize uncomplete hostnames and returns true.

Is there any method to accomplish this? Maybe without using RegExps, even if I'd prefer to use a single RegExp.

Upvotes: 4

Views: 5620

Answers (7)

Gawrion
Gawrion

Reputation: 87

This regex should match every requirements for domains. It limits TLD to 24 characters becouse it's currently the longest TLD but u can change it to theoretical 63 chars (then u must change "25" to "64" - keep in mind that there are two instances of it):

^\s*(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25})(\s*,\s*(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25}))*\s*$

Here u can test it: https://regex101.com/r/ZyPMn4/1

Upvotes: 0

Brock Amhurst
Brock Amhurst

Reputation: 497

function validate() {
    //Get the user input
    var hostnames = document.getElementById('yourtextarea').value;
    //Regex to validate hostname
    var re = new RegExp(/^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$/);
    //Trim whitespace
    hostnames = hostnames.trim();
    //Explode into an array
    hostnames = hostnames.split(",");
    //Loop through array & test each hostname with regex
    var is_valid = true;
    for (var i=0; i < hostnames.length; i++){
        var hostname = hostnames[i].trim();
        if (re.test(hostname)) {
           is_valid = true; //if valid, continue loop
        } else {
           is_valid = false; //if invalid, break loop and return false
           break;
        }
    } //end for loop
    return is_valid;
} //end function validate()

Matches every example you indicated except "dom-ain.it, domain.com, domain.eu.org.something" because "something" is not valid.

JSFiddle: http://jsfiddle.net/nesutqjf/2/

Upvotes: 1

cbreezier
cbreezier

Reputation: 1314

The answers saying to not use regex are perfectly fine, but I like regex so:

^\s*(?:(?:\w+(?:-+\w+)*\.)+[a-z]+)\s*(?:,\s*(?:(?:\w+(?:-+\w+)*\.)+[a-z]+)\s*)*$

Yeah..it's not so pretty. But it works - tested on your sample cases at http://regex101.com

Edit: OK let's break it down. And only allow sub-domain-01.com and a--b.com and not -.com

Each subdomain thingo: \w+(?:-+\w+)* matches string of word characters plus optionally some words with dashes preceeding it.

Each hostname: \s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s* a bunch of subdomain thingos followed by a dot. Then finally followed by a string of letters only (the tld). And of course the optional spaces around the sides.

Whole thing: \s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s*(?:,\s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s*)* a single hostname, followed by 0 or more ,hostnames for our comma separated list.

Pretty simple really.

Upvotes: 3

Xotic750
Xotic750

Reputation: 23482

An example with validate.js which has well tested routines for testing a valid FQDN. Alternatively look through the source and grab what you need.

function validate (e) {
    var target = e.target || e;
    
    target.value.split(',').some(function (item) {
        var notValid = !validator.isFQDN(item.trim());
        
        if (notValid) {
            target.classList.add('bad');
        } else {
            target.classList.remove('bad');
        }
      
      return notValid;
    });
}

var domains = document.getElementById('domains');

domains.addEventListener('change', validate);

validate(domains);
#domains {
    width: 300px;
    height: 100px;
}
.bad {
    background-color: red
}
<script src="http://rawgit.com/chriso/validator.js/master/validator.js"></script>
<textarea id="domains">www.example.com, example.com, example.ca, example, example.com example.nl www.example, www.exam ple.com</textarea>

Upvotes: 0

Mouser
Mouser

Reputation: 13304

While @dandavis's answer/comment is impressive, lets break it down in to steps.

  1. Get the value from the textarea and trim() leading and ending spaces.
  2. Replace all white spaces with a single white space using /\s+/g. meaning find every white space occurring one or more times.
  3. Split by ,<space> or <space>,<space>. Split returns array.
  4. Iterate every array element with filter
  5. Check if element is a valid domain. If so return it.

var domains = document.querySelector("textarea").value;
domains = domains.trim().replace(/\s+/g, " ").split(/\s?,\s/);

var domainsTested = domains.filter(function(element){
                  if (element.match(/^[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})$/))
                    {
                      return element;
                    }
              })

document.write(domainsTested.join(" | ")); //this is just here to show the results.
document.write("<br />Domainstring is ok: " + (domainsTested.length == domains.length)); //If it's valid then this should be equal.
<textarea style="width: 300px; height: 100px">www.example.com    , example.com, example.ca,     example, example.com example.nl     www.example,    www.exam ple.com,  sub.sub.sub.domain.tv, do main.it,   sub.domain.tv</textarea>

Upvotes: 1

Terry
Terry

Reputation: 66133

I have been using this pattern for awhile, and seems to be working for your case, too:

/^[a-zA-Z0-9][a-zA-Z0-9\-_]*\.([a-zA-Z0-9]+|[a-zA-Z0-9\-_]+\.[a-zA-Z]+)+$/gi

The logic is simple:

  • ^[a-zA-Z0-9]: The URL must start with an alphanumeric character
  • [a-zA-Z0-9\-_]*: The first alphanumeric character can be followed by zero or more of: an alphanumeric character, an underscore or a dash
  • \.: The first piece must be followed by a period.
  • The second piece must follow the same pattern:
    1. [a-zA-Z0-9]+: One or more alphanumeric character, OR
    2. [a-zA-Z0-9\-_]+\.[a-zA-Z0-9]+: One or more alphanumeric character, an underscore or a dash followed by a period and one or more alphanumeric character

You can check this pattern working for most of your URLs in the following code snippet. How I do it is similar to the strategy described by others:

  • Get the value of the textarea on keyup (or you can bind the submit, blur, keypress, keydown, change and etc)
  • Split the value by the , character
  • Use $.trim() to remove flanking whitespaces
  • Use the RegEx pattern above to evaluate each individual string

Optional, done for visual output:

  • Generate a list of URLs
  • Indicate if each URL entered is valid or not

$(function() {
    $('textarea').keyup(function() {
        var urls = $(this).val().split(',');
        $('ul').empty();
        $.each(urls, function(i,v) {
            // Trim URL
            var url = $.trim(v);
            
            // RegEx
            var pat = /^[a-zA-Z0-9][a-zA-Z0-9\-_]*\.([a-zA-Z0-9]+|[a-zA-Z0-9\-_]+\.[a-zA-Z]+)+$/gi,
                test = pat.test(url);
            
            // Append
            $('ul').append('<li>'+url+' <span>'+test+'</span></li>');
        });
    });
});
textarea {
    width: 100%;
    height: 100px;
}
ul span {
    background-color: #eee;
    display: inline-block;
    margin-left: .25em;
    padding: 0 .25em;
    text-transform: uppercase;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea placeholder="Paste URLs here"></textarea>
<ul></ul>

Upvotes: 0

Ixrec
Ixrec

Reputation: 976

I would not use regexps for this, because you have a lot of different rules you want to check. Regexps are good when you only have a couple of rules that are very simple to express but a pain to write out as "parsing code".

I'd simply do hostnames.split(',').forEach(validateHostname);, as most of the comments suggest, and inside validateHostname reject any hostname that has spaces in the middle, two adjacent dots, no dots, ends in a dot, has non-ASCII characters, has digits in the last dot-separated token, and so on and so forth.

A function like this will be much easier to add new rules to than a regexp would be.

Upvotes: 0

Related Questions