Reputation: 69346
I'm trying to validate the content of a <textarea>
using JavaScript, So I created a validate()
function, which returns true
or false
wheter the text inside the textarea is valid or not.
The textarea can only contain comma separated hostnames. By hostname I mean something like subdomain.domain.com
, so it's basically some dot separated strings. Since that users don't tend to write very well, I also want to allow the possibility of leaving any amount of spaces between the various hostnames and commas, but not inside a hostname.
Here are some examples of what should or shouldn't match:
Should match:
domain.com,domain2.co.vu,sub.domain.org
domai2n.com , dom-ain.org.co.vu.nl ,domain.it
dom-ain.it, domain.com, domain.eu.org.something
a.b.c, a.b, a.a.a , a.r
0191481.com
Should not match:
domain.com., sub.domain.it
uncomplete hostnamedomain.me, domain2
uncomplete hostnamesub.sub.sub.domain.tv, do main.it
hostname contains spacessite
uncomplete hostnamehèy.com
hostname cannot contain accentshey.01com
hostname cannot end with numbers or strings containing numbershello.org..wow
uncomplete hostnameI built my function using the following code:
function validate(text) {
return (
(/^([a-z0-9\-\.]+ *, *)*[a-z0-9\-\.]+[^, ]$/i.test(text)
&& !/\.[^a-z]|\.$/i.test(text)
&& ~text.indexOf('.'))
);
}
unfortunately, my function just doesn't work. It fails to recognize uncomplete hostnames and returns true
.
Is there any method to accomplish this? Maybe without using RegExps, even if I'd prefer to use a single RegExp.
Upvotes: 4
Views: 5620
Reputation: 87
This regex should match every requirements for domains. It limits TLD to 24 characters becouse it's currently the longest TLD but u can change it to theoretical 63 chars (then u must change "25" to "64" - keep in mind that there are two instances of it):
^\s*(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25})(\s*,\s*(?!.*?_.*?)(?!(?:[\d\w]+?\.)?\-[\w\d\.\-]*?)(?![\w\d]+?\-\.(?:[\d\w\.\-]+?))(?=[\w\d])(?=[\w\d\.\-]*?\.+[\w\d\.\-]*?)(?![\w\d\.\-]{254})(?!(?:\.?[\w\d\-\.]*?[\w\d\-]{64,}\.)+?)[\w\d\.\-]+?(?<![\w\d\-\.]*?\.[\d]+?)(?<=[\w\d\-]{2,})(?<![\w\d\-]{25}))*\s*$
Here u can test it: https://regex101.com/r/ZyPMn4/1
Upvotes: 0
Reputation: 497
function validate() {
//Get the user input
var hostnames = document.getElementById('yourtextarea').value;
//Regex to validate hostname
var re = new RegExp(/^([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}$/);
//Trim whitespace
hostnames = hostnames.trim();
//Explode into an array
hostnames = hostnames.split(",");
//Loop through array & test each hostname with regex
var is_valid = true;
for (var i=0; i < hostnames.length; i++){
var hostname = hostnames[i].trim();
if (re.test(hostname)) {
is_valid = true; //if valid, continue loop
} else {
is_valid = false; //if invalid, break loop and return false
break;
}
} //end for loop
return is_valid;
} //end function validate()
Matches every example you indicated except "dom-ain.it, domain.com, domain.eu.org.something" because "something" is not valid.
JSFiddle: http://jsfiddle.net/nesutqjf/2/
Upvotes: 1
Reputation: 1314
The answers saying to not use regex are perfectly fine, but I like regex so:
^\s*(?:(?:\w+(?:-+\w+)*\.)+[a-z]+)\s*(?:,\s*(?:(?:\w+(?:-+\w+)*\.)+[a-z]+)\s*)*$
Yeah..it's not so pretty. But it works - tested on your sample cases at http://regex101.com
Edit: OK let's break it down. And only allow sub-domain-01.com
and a--b.com
and not -.com
Each subdomain thingo: \w+(?:-+\w+)*
matches string of word characters plus optionally some words with dashes preceeding it.
Each hostname: \s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s*
a bunch of subdomain thingos followed by a dot. Then finally followed by a string of letters only (the tld). And of course the optional spaces around the sides.
Whole thing: \s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s*(?:,\s*(?:(?:\w+(?:-\w+)*\.)+[a-z]+)\s*)*
a single hostname, followed by 0 or more ,hostname
s for our comma separated list.
Pretty simple really.
Upvotes: 3
Reputation: 23482
An example with validate.js
which has well tested routines for testing a valid FQDN. Alternatively look through the source and grab what you need.
function validate (e) {
var target = e.target || e;
target.value.split(',').some(function (item) {
var notValid = !validator.isFQDN(item.trim());
if (notValid) {
target.classList.add('bad');
} else {
target.classList.remove('bad');
}
return notValid;
});
}
var domains = document.getElementById('domains');
domains.addEventListener('change', validate);
validate(domains);
#domains {
width: 300px;
height: 100px;
}
.bad {
background-color: red
}
<script src="http://rawgit.com/chriso/validator.js/master/validator.js"></script>
<textarea id="domains">www.example.com, example.com, example.ca, example, example.com example.nl www.example, www.exam ple.com</textarea>
Upvotes: 0
Reputation: 13304
While @dandavis's answer/comment is impressive, lets break it down in to steps.
trim()
leading and ending spaces./\s+/g
. meaning find every white space occurring one or more times.,<space>
or <space>,<space>
. Split returns array.var domains = document.querySelector("textarea").value;
domains = domains.trim().replace(/\s+/g, " ").split(/\s?,\s/);
var domainsTested = domains.filter(function(element){
if (element.match(/^[a-zA-Z0-9][a-zA-Z0-9-_]{0,61}[a-zA-Z0-9]{0,1}\.([a-zA-Z]{1,6}|[a-zA-Z0-9-]{1,30}\.[a-zA-Z]{2,3})$/))
{
return element;
}
})
document.write(domainsTested.join(" | ")); //this is just here to show the results.
document.write("<br />Domainstring is ok: " + (domainsTested.length == domains.length)); //If it's valid then this should be equal.
<textarea style="width: 300px; height: 100px">www.example.com , example.com, example.ca, example, example.com example.nl www.example, www.exam ple.com, sub.sub.sub.domain.tv, do main.it, sub.domain.tv</textarea>
Upvotes: 1
Reputation: 66133
I have been using this pattern for awhile, and seems to be working for your case, too:
/^[a-zA-Z0-9][a-zA-Z0-9\-_]*\.([a-zA-Z0-9]+|[a-zA-Z0-9\-_]+\.[a-zA-Z]+)+$/gi
The logic is simple:
^[a-zA-Z0-9]
: The URL must start with an alphanumeric character[a-zA-Z0-9\-_]*
: The first alphanumeric character can be followed by zero or more of: an alphanumeric character, an underscore or a dash\.
: The first piece must be followed by a period.[a-zA-Z0-9]+
: One or more alphanumeric character, OR[a-zA-Z0-9\-_]+\.[a-zA-Z0-9]+
: One or more alphanumeric character, an underscore or a dash followed by a period and one or more alphanumeric characterYou can check this pattern working for most of your URLs in the following code snippet. How I do it is similar to the strategy described by others:
,
character$.trim()
to remove flanking whitespacesOptional, done for visual output:
$(function() {
$('textarea').keyup(function() {
var urls = $(this).val().split(',');
$('ul').empty();
$.each(urls, function(i,v) {
// Trim URL
var url = $.trim(v);
// RegEx
var pat = /^[a-zA-Z0-9][a-zA-Z0-9\-_]*\.([a-zA-Z0-9]+|[a-zA-Z0-9\-_]+\.[a-zA-Z]+)+$/gi,
test = pat.test(url);
// Append
$('ul').append('<li>'+url+' <span>'+test+'</span></li>');
});
});
});
textarea {
width: 100%;
height: 100px;
}
ul span {
background-color: #eee;
display: inline-block;
margin-left: .25em;
padding: 0 .25em;
text-transform: uppercase;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea placeholder="Paste URLs here"></textarea>
<ul></ul>
Upvotes: 0
Reputation: 976
I would not use regexps for this, because you have a lot of different rules you want to check. Regexps are good when you only have a couple of rules that are very simple to express but a pain to write out as "parsing code".
I'd simply do hostnames.split(',').forEach(validateHostname);
, as most of the comments suggest, and inside validateHostname
reject any hostname that has spaces in the middle, two adjacent dots, no dots, ends in a dot, has non-ASCII characters, has digits in the last dot-separated token, and so on and so forth.
A function like this will be much easier to add new rules to than a regexp would be.
Upvotes: 0