Reputation: 1639
I wrote this regex for the re module which, as far as I know, works as expected :
^(https?://)([\w\.-]+)[\./]*(?(1)(domain-name.com))
run against a list of urls, it matches only the ones containing domain-name.com. But I don't understand why :
^(https?://)([\w\.-]+)[\./]*(?(1)(!(domain-name.com)))
does not return all the other urls. Actually it never matches anything.
Thank you
on pythex
Upvotes: 0
Views: 96
Reputation: 22837
To match domain-name.com
domains, use the following.
^https?://(?:\w+(?:-\w+)*\.)*domain-name\.com(?=$|/)
^
Assert position at the start of the linehttps?
Match http
or https
(s
is optional)://
Match this literally(?:\w+(?:-\w+)*\.)*
Match any number of subdomains. A subdomain cannot begin or end with -
, so this subpattern does as follows:
\w+
Match one or more word characters(?:-\w+)*
Match the following any number of times
-
Match this literally\w+
Match one or more word characters\.
Match the dot character literallydomain-name\.com
Matches domain-name.com
literally(?=$|[/?#])
Positive lookahead ensuring either the end of the line or a character in the set /?#
followsTo match non-domain-name.com
domains, use the following.
^https?://(?:\w+(?:-\w+)*\.)*(?!domain-name\.com)[\w-]+\.[\w-]+(?=$|/)
This is the same as the first pattern except it uses (?!domain-name\.com)[\w-]+\.[\w-]+
. This matches any domain that doesn't match domain-name.com
literally
Upvotes: 1
Reputation: 226
You need to use negative lookahead with ?! instead of !
^(https?://)([\w\.-]+)[\./]*(?(1)(?!(domain-name.com)))
Upvotes: 0