Reputation: 1678
I'm trying to match some expression with regex but it's not working. I want to match a string not starting with http://www.domain.com
. Here is my regex :
^https?:\/\/(www\.)?(?!domain\.com)
Is there a problem with my regex?
I want to match expression starting with http:// but different from http://site.com For example:
/page.html => false
http://www.google.fr => true
http://site.com => false
http://site.com/page.html => false
Upvotes: 0
Views: 7972
Reputation: 1374
The problem here is that when the regex engine encounters the successful match on the negative look-ahead it will treat the match as a failure (as expected) and backtrack to the previous group (www\.)
quantified as optional and then see if the expression is successful without it. This is what you have over looked.
This could be fixed with the application of atomic grouping or possessive quantifiers to 'forget' the possibility of backtracking. Unfortunately python regex doesn't support this natively. Instead you'll have to use a much less efficient method: using a larger look-ahead.
^https?:\/\/(?!(www\.)?(domain\.com))
Upvotes: 1
Reputation: 1122342
You want a negative look-ahead assertion:
^https?://(?!(?:www\.)?site\.com).+
Which gives:
>>> testdata = '''\
... /page.html => false
... http://www.google.fr => true
... http://site.com => false
... http://site.com/page.html => false
... '''.splitlines()
>>> not_site_com = re.compile(r'^https?://(?!(?:www\.)?site\.com).+')
>>> for line in testdata:
... match = not_site_com.search(line)
... if match: print match.group()
...
http://www.google.fr => true
Note that the pattern excludes both www.site.com
and site.com
:
>>> not_site_com.search('https://www.site.com')
>>> not_site_com.search('https://site.com')
>>> not_site_com.search('https://site-different.com')
<_sre.SRE_Match object at 0x10a548510>
Upvotes: 0
Reputation: 1667
Use this to match a URL that does not have the domain you mention: https?://(?!(www\.domain\.com\/?)).*
Example in action: http://regexr.com?34a7p
Upvotes: 7