The problem I'm trying to validate the content of a <textarea> using JavaScript, So I created a validate() function, which returns true or false wheter the text inside the textarea is valid or not. The textarea can only contain comma separated hostnames . By hostname I mean something like subdomain.domain.com , so it's basically some dot separated strings. Since that users don't tend to write very well, I also want to allow the possibility of leaving any amount of spaces between the various hostnames and commas, but not inside a hostname . Here are some examples of what should or shouldn't match: Should match: domain.com,domain2.co.vu,sub.domain.org domai2n.com , dom-ain.org.co.vu.nl ,domain.it dom-ain.it, domain.com, domain.eu.org.something a.b.c, a.b, a.a.a , a.r 0191481.com Should not match: domain.com., sub.domain.it uncomplete hostname domain.me, domain2 uncomplete hostname sub.sub.sub.domain.tv, do main.it hostname contains spaces site uncomplete hostname hèy.com hostname cannot contain accents hey.01com hostname cannot end with numbers or strings containing numbers hello.org..wow uncomplete hostname What I have tried so far I built my function using the following code: function validate(text) { return ( (/^([a-z0-9\-\.]+ *, *)*[a-z0-9\-\.]+[^, ]$/i.test(text) && !/\.[^a-z]|\.$/i.test(text) && ~text.indexOf('.')) ); } unfortunately, my function just doesn't work. It fails to recognize uncomplete hostnames and returns true . Is there any method to accomplish this? Maybe without using RegExps, even if I'd prefer to use a single RegExp.

Reputation: 69346

RegExp to match comma separated hostnames

The problem

I'm trying to validate the content of a <textarea> using JavaScript, So I created a validate() function, which returns true or false wheter the text inside the textarea is valid or not.

The textarea can only contain comma separated hostnames. By hostname I mean something like subdomain.domain.com, so it's basically some dot separated strings. Since that users don't tend to write very well, I also want to allow the possibility of leaving any amount of spaces between the various hostnames and commas, but not inside a hostname.

Here are some examples of what should or shouldn't match:

Should match:
- domain.com,domain2.co.vu,sub.domain.org
- domai2n.com , dom-ain.org.co.vu.nl ,domain.it
- dom-ain.it, domain.com, domain.eu.org.something
- a.b.c, a.b, a.a.a , a.r
- 0191481.com
Should not match:
- domain.com., sub.domain.it uncomplete hostname
- domain.me, domain2 uncomplete hostname
- sub.sub.sub.domain.tv, do main.it hostname contains spaces
- site uncomplete hostname
- hèy.com hostname cannot contain accents
- hey.01com hostname cannot end with numbers or strings containing numbers
- hello.org..wow uncomplete hostname

What I have tried so far

I built my function using the following code:

function validate(text) {
    return (
        (/^([a-z0-9\-\.]+ *, *)*[a-z0-9\-\.]+[^, ]$/i.test(text) 
        && !/\.[^a-z]|\.$/i.test(text)
        && ~text.indexOf('.'))
    );
}

unfortunately, my function just doesn't work. It fails to recognize uncomplete hostnames and returns true.

Is there any method to accomplish this? Maybe without using RegExps, even if I'd prefer to use a single RegExp.

Upvotes: 4

Answers (7)

Gawrion

Reputation: 87

This regex should match every requirements for domains. It limits TLD to 24 characters becouse it's currently the longest TLD but u can change it to theoretical 63 chars (then u must change "25" to "64" - keep in mind that there are two instances of it):

^\s*(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25})(\s*,\s*(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25}))*\s*$

Here u can test it: https://regex101.com/r/ZyPMn4/1

Upvotes: 0

Brock Amhurst

Reputation: 497

function validate() {
    //Get the user input
    var hostnames = document.getElementById('yourtextarea').value;
    //Regex to validate hostname
    var re = new RegExp(/^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$/);
    //Trim whitespace
    hostnames = hostnames.trim();
    //Explode into an array
    hostnames = hostnames.split(",");
    //Loop through array & test each hostname with regex
    var is_valid = true;
    for (var i=0; i < hostnames.length; i++){
        var hostname = hostnames[i].trim();
        if (re.test(hostname)) {
           is_valid = true; //if valid, continue loop
        } else {
           is_valid = false; //if invalid, break loop and return false
           break;
        }
    } //end for loop
    return is_valid;
} //end function validate()

Matches every example you indicated except "dom-ain.it, domain.com, domain.eu.org.something" because "something" is not valid.

JSFiddle: http://jsfiddle.net/nesutqjf/2/

Upvotes: 1

cbreezier

Reputation: 1314

The answers saying to not use regex are perfectly fine, but I like regex so:

^\s*(?:(?:\w+(?:-+\w+)*\.)+[a-z]+)\s*(?:,\s*(?:(?:\w+(?:-+\w+)*\.)+[a-z]+)\s*)*$

Yeah..it's not so pretty. But it works - tested on your sample cases at http://regex101.com

Edit: OK let's break it down. And only allow sub-domain-01.com and a--b.com and not -.com

Each subdomain thingo: \w+(?:-+\w+)* matches string of word characters plus optionally some words with dashes preceeding it.

Each hostname: \s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s* a bunch of subdomain thingos followed by a dot. Then finally followed by a string of letters only (the tld). And of course the optional spaces around the sides.

Whole thing: \s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s*(?:,\s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s*)* a single hostname, followed by 0 or more ,hostnames for our comma separated list.

Pretty simple really.

Upvotes: 3

Xotic750

Reputation: 23482

An example with validate.js which has well tested routines for testing a valid FQDN. Alternatively look through the source and grab what you need.

function validate (e) {
    var target = e.target || e;
    
    target.value.split(',').some(function (item) {
        var notValid = !validator.isFQDN(item.trim());
        
        if (notValid) {
            target.classList.add('bad');
        } else {
            target.classList.remove('bad');
        }
      
      return notValid;
    });
}

var domains = document.getElementById('domains');

domains.addEventListener('change', validate);

validate(domains);

#domains {
    width: 300px;
    height: 100px;
}
.bad {
    background-color: red
}

<script src="http://rawgit.com/chriso/validator.js/master/validator.js"></script>
<textarea id="domains">www.example.com, example.com, example.ca, example, example.com example.nl www.example, www.exam ple.com</textarea>

Upvotes: 0

Mouser

Reputation: 13304

While @dandavis's answer/comment is impressive, lets break it down in to steps.

Get the value from the textarea and trim() leading and ending spaces.
Replace all white spaces with a single white space using /\s+/g. meaning find every white space occurring one or more times.
Split by ,<space> or <space>,<space>. Split returns array.
Iterate every array element with filter
Check if element is a valid domain. If so return it.

var domains = document.querySelector("textarea").value;
domains = domains.trim().replace(/\s+/g, " ").split(/\s?,\s/);

var domainsTested = domains.filter(function(element){
                  if (element.match(/^[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})$/))
                    {
                      return element;
                    }
              })

document.write(domainsTested.join(" | ")); //this is just here to show the results.
document.write("<br />Domainstring is ok: " + (domainsTested.length == domains.length)); //If it's valid then this should be equal.

<textarea style="width: 300px; height: 100px">www.example.com    , example.com, example.ca,     example, example.com example.nl     www.example,    www.exam ple.com,  sub.sub.sub.domain.tv, do main.it,   sub.domain.tv</textarea>

Upvotes: 1

Terry

Reputation: 66133

I have been using this pattern for awhile, and seems to be working for your case, too:

/^[a-zA-Z0-9][a-zA-Z0-9\-_]*\.([a-zA-Z0-9]+|[a-zA-Z0-9\-_]+\.[a-zA-Z]+)+$/gi

The logic is simple:

^[a-zA-Z0-9]: The URL must start with an alphanumeric character
[a-zA-Z0-9\-_]*: The first alphanumeric character can be followed by zero or more of: an alphanumeric character, an underscore or a dash
\.: The first piece must be followed by a period.
The second piece must follow the same pattern:
1. [a-zA-Z0-9]+: One or more alphanumeric character, OR
2. [a-zA-Z0-9\-_]+\.[a-zA-Z0-9]+: One or more alphanumeric character, an underscore or a dash followed by a period and one or more alphanumeric character

You can check this pattern working for most of your URLs in the following code snippet. How I do it is similar to the strategy described by others:

Get the value of the textarea on keyup (or you can bind the submit, blur, keypress, keydown, change and etc)
Split the value by the , character
Use $.trim() to remove flanking whitespaces
Use the RegEx pattern above to evaluate each individual string

Optional, done for visual output:

Generate a list of URLs
Indicate if each URL entered is valid or not

$(function() {
    $('textarea').keyup(function() {
        var urls = $(this).val().split(',');
        $('ul').empty();
        $.each(urls, function(i,v) {
            // Trim URL
            var url = $.trim(v);
            
            // RegEx
            var pat = /^[a-zA-Z0-9][a-zA-Z0-9\-_]*\.([a-zA-Z0-9]+|[a-zA-Z0-9\-_]+\.[a-zA-Z]+)+$/gi,
                test = pat.test(url);
            
            // Append
            $('ul').append('<li>'+url+' <span>'+test+'</span></li>');
        });
    });
});

textarea {
    width: 100%;
    height: 100px;
}
ul span {
    background-color: #eee;
    display: inline-block;
    margin-left: .25em;
    padding: 0 .25em;
    text-transform: uppercase;
}

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea placeholder="Paste URLs here"></textarea>
<ul></ul>

Upvotes: 0

Ixrec

Reputation: 976

I would not use regexps for this, because you have a lot of different rules you want to check. Regexps are good when you only have a couple of rules that are very simple to express but a pain to write out as "parsing code".

I'd simply do hostnames.split(',').forEach(validateHostname);, as most of the comments suggest, and inside validateHostname reject any hostname that has spaces in the middle, two adjacent dots, no dots, ends in a dot, has non-ASCII characters, has digits in the last dot-separated token, and so on and so forth.

A function like this will be much easier to add new rules to than a regexp would be.

Upvotes: 0

RegExp to match comma separated hostnames

The problem

What I have tried so far

Answers (7)

Related Questions