Reputation: 346
I'm having an issue with some spam links at the moment. My users are allowed to post hyperlinks in an input box and I'd like to be able to restrict this to only certain domains (if a hyperlink is found in the text).
The spam has gotten to a point where I am disabling all hyperlinks using the following regex:
if(new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?(/.*)?").test(contentString)){
alert("URLs are not allowed!");
return false;
}
I want to ease this up a bit and only allow specific hyperlink domains.
I tried this which was found here:
function isAllowed(urlString)
{
var allowed = ['example.com', 'stackoverflow.com', 'google.com'];
var urlObject = new URL(urlString);
return allowed.indexOf(urlObject.host) > -1;
}
console.log(isAllowed('http://example.com/path/?q=1')); // true
console.log(isAllowed('https://subdomain.example.com/')); // false
console.log(isAllowed('http://stacksnippets.net')); // false
if (!isAllowed(document.getElementById('yourTextbox').value))
{
alert('Domain is not allowed!');
}
However, this only works if the string is a hyperlink itself so now I am a bit stumped on how to accomplish this.
Upvotes: 0
Views: 129
Reputation: 17382
Let's assume your regex works (didn't look into it very deep), then, instead of just using .test(...)
to test whether some string matches the regex, you can also use .exec(...)
to get more information about the match, especially you get the capture groups of a match. Additionally using the g
(global) flag of the regex will allow you to get all matches for the regex in the string, not only the first one. And maybe you should also add the i
flag to make your regex caseinsensitive.
var alloweddomains = ["stackoverflow.com", "google.com", ...];
var regex = new RegExp(..., "gi");
var teststring = "foo bar https://stackoverflow.com/questions/123456 baz blubb"
var m = regex.match(teststring);
//while m !== null, the regexp returned a match
//so this iterates over all matches in the teststring
while (m !== null) {
//handle the current match. see explanation below
....
//get the next match
m = regex.match(teststring);
}
regex.match()
will either return null
if the regex doesn't match (ie the string is not a URL) or an array, containing the capture groups. For that particular example it will return the following array
["https://stackoverflow.com/question/123456",
"https://",
undefined,
"stackoverflow.com",
undefined,
"/question/123456"
]
The first element in the array is the whole match, the other elements are capturing groups defined by (...)
in your regex. For your usecase m[3]
is of particular interest, because it contains the domain of the URL. With that information you can now easily check if the domain is included in your list of allowed domains
if (!alloweddomains.includes(m[3].toLowerCase()) {
alert("domain is not allowed");
}
With the check, also use .toLowerCase()
because with the i
flag, the regex will also match "HTTPS://STACKOVERFLOW.COM", but .includes()
is casesensitive, so it wouldn't find it in the array of alloweddomains
if it's uppercase ...
EDIT
On a closer look, the very last part of your regex could be problematic
(/.*)?
will match the entire rest of the string, if text before is indeed a URL. I'd suggest to use something like
(/[^\s]*)?
here, so that the regex will end the match at the first whitespace.
And also limiting the TLD to 2-4 characters doesn't seem correct anymore. There are many TLD like .cityname
or whatever around which won't be matched.
Upvotes: 2
Reputation: 93
var textBoxValue = document.getElementById('yourTextbox').value;
// get all the domains, subdomains from the textbox value
var domains = textBoxValue.match(/\w+?(\.\w+)+/g);
domains.forEach(function (domain) {
if (!isAllowed(domain)) {
// alert('Domain is not allowed!');
console.log(domain + ' is not allowed!');
}
});
Upvotes: 0