Neha Choudhary
Neha Choudhary

Reputation: 4870

Javascript engine fails while executing this regex if given long strings to test

This regex is written to disallow the urls starting with any url scheme and slashes(forward slash, backward slash) but will allow urls like "domain.tld" which are not starting with any url scheme or slashes. It should also allow the strings which are not url("some random input").

^(?!://)((?!//))(?!(.*?)*://)(?!:\\\\)(?!:/\\\\/\\\\)(?!(.*?)*:/\\\\/\\\\)(?!/\\\\/\\\\)(?!\\\\)(?!(.*?)*:\\\\)(?!www.)(?!(.*?)*.www.).*$

This regex works fine in java but in javscript, it is failing for longer strings.

Example: It works fine for "hey. hey hey hey hey" but starts taking time with "hey. hey hey hey hey " and hangs after "hey. hey hey hey hey hey hey"

Following are the cases which should be tested against the regex:

String                  | Expected result
__________________________________________
http://www.google.com   | False
HTTP://WWW.google.com   | False
adasd://www.google.com  | False
ftp://www.google.com    | False
mailto://www.google.com | False
//www.google.com        | False
://www.google.com       | False
www.google.com          | False
WWW.google.com          | False
test .http://google.com | False
skksdwww.google.com     | False
wWW.google.com          | False
://google.com           | False
.www.google.com         | False
as;;; .wwW.google.com   | False
as.wwW.google.com       | False
= #$@%@#.www.google.com | False
http:/\\/\\google.com   | False
:/\\/\\google.com       | False
http://gogle.com        | False
gogle.com //google.com  | False
google.com              | true
some random input       | true

What could be the problem in it?

UPDATE: I have updated the regex as per Wiktor Stribiżew's comment and it works fine.

Upvotes: 4

Views: 139

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626709

The (.*?)* subpattern is disastrous inside larger patterns. The nested * quantifiers (lazy inside and greedy outside) allow the regex engine to check a huge amount of substring variations before a failure occurs with a string that should not be matched.

Always test your patterns against strings that should not match.

Also, if you need to match a literal dot, escape it.

Here is your fixed and contracted regex:

^
  (?!.*?:?//)
  (?!:(?:/\\\\/)?\\\\)
  (?!(?:.*?:)?(?:/\\\\/)?\\\\)
  (?!(?:.*?\.)?www\.)
 .*
$

Or a one-liner:

^(?!.*?:?//)(?!:(?:/\\\\/)?\\\\)(?!(?:.*?:)?(?:/\\\\/)?\\\\)(?!(?:.*?\.)?www\.).*$

See the regex demo

Upvotes: 1

Whothehellisthat
Whothehellisthat

Reputation: 2152

I didn't examine the whole thing (wow, that's a lot of slashes!) but you could greatly simplify the regex, I'm guessing. Just going from your post, maybe this would work for you:

/^(?!.*(?:www\.|\/\/|\/\\\\\/\\\\))/i

Should test negative for any protocol, including an empty one, including file: urls. Let me know if I've missed anything the regex needs to test for.

UPDATE: Now passes all tests.

Upvotes: 1

Related Questions