Reputation: 3410
Let's say I have this url:
https://www.google.com/search?q=test&tbm=isch&randomParameters=123
I want to match google's search url, when it doesn't contain:
tbm=isch
tbm=news
param1=432
I've tried this pattern:
^http(s):\/\/www.google.(.*)\/(search|webhp)\?(?![\s]+(tbm=isch|tbm=news|param1=432))
but it's not working (as in still matching), the sample url
Upvotes: 0
Views: 89
Reputation: 43169
You could use:
^ # anchor it to the beginning
https?:// # http or https
(?:
(?!tbm=(?:isch|news)) # first neg. lookahead
(?!param1=432) # second
\S # anything but whitespace
)+
$ # THE END
See a demo on regex101.com.
There might be builtin-methods like urlparse()
for your specific programming language though.
Upvotes: 3
Reputation: 1146
You should change the [\s]+
to .*?
or [\S]*?
and your regex will work. To also match the whole url, if it fits the criteria, you can add another [\S]* at the end:
^http(s):\/\/www.google.([\w\.]*)\/(search|webhp)\?(?![\S]*?(tbm=isch|tbm=news|param1=432))[\S]*
Upvotes: 1
Reputation: 32797
Your regex should be
^https:\/\/www.google.([^\/]*)\/(search|webhp)\?(?!.*(tbm\=isch|tbm\=news|param1\=432)).*$
The issue was that you were trying to do lookahead with \s*
instead of .*
which will match any number of characters.
Also www.google.(.*)
would have caused a lot of backtracking causing performance issue so I have replaced it with www.google.([^\/]*)
Edit
Am wondering why you are using regex for this instead of simple indexof or similar methods from the language you are using. Any special usecase here??
Upvotes: 2