Reputation: 1321
I wrote the following regex:
(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?
Its behaviour can be seen here: http://gskinner.com/RegExr/?34b8m
I wrote the following JavaScript code:
var urlexp = new RegExp(
'^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$', 'gi'
);
document.write(urlexp.test("blaaa"))
And it returns true
even though the regex was supposed to not allow single words as valid.
What am I doing wrong?
Upvotes: 5
Views: 7442
Reputation: 6800
Your problem is that JavaScript is viewing all your escape sequences as escapes for the string. So your regex goes to memory looking like this:
^(https?://)?([da-z.-]+).([a-z]{2,6})(/(w|-)*)*/?$
Which you may notice causes a problem in the middle when what you thought was a literal period turns into a regular expressions wildcard. You can solve this in a couple ways. Using the forward slash regular expression syntax JavaScript provides:
var urlexp = /^(https?:\/\/)?([da-z\.-]+)\.([a-z]{2,6})(\/(\w|-)*)*\/?$/gi
Or by escaping your backslashes (and not your forward slashes, as you had been doing - that's exclusively for when you're using /regex/mod
notation, just like you don't have to escape your single quotes in a double quoted string and vice versa):
var urlexp = new RegExp('^(https?://)?([da-z.-]+)\\.([a-z]{2,6})(/(\\w|-)*)*/?$', 'gi')
Please note the double backslash before the w - also necessary for matching word characters.
A couple notes on your regular expression itself:
[da-z.-]
d
is contained in the a-z range. Unless you meant \d
? In that case, the slash is important.
(/(\w|-)*)*/?
My own misgivings about the nested Kleene stars aside, you can whittle that alternation down into a character class, and drop the terminating /?
entirely, as a trailing slash will be match by the group as you've given it. I'd rewrite as:
(/[\w-]*)*
Though, maybe you'd just like to catch non space characters?
(/[^/\s]*)*
Anyway, modified this way your regular expression winds up looking more like:
^(https?://)?([\da-z.-]+)\.([a-z]{2,6})(/[\w-]*)*$
Remember, if you're going to use string notation: Double EVERY backslash. If you're going to use native /regex/mod
notation (which I highly recommend), escape your forward slashes.
Upvotes: 9