Reputation: 73
I need to find with the regular expression domain names that don't start with the string "http". For example:
I found a regex that almost got this:
(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}
But it also detects "https://domain1.com"
Example given:
https://regex101.com/r/DjDBrx/1/
In this example I want to avoid "https://domain1.com"
Any help would be gratefully appreciated.
Upvotes: 2
Views: 941
Reputation: 627537
You can use a word boundary coupled with two negative lookbehinds:
\b(?<!http:\/\/)(?<!https:\/\/)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^
The (?<!http:\/\/)(?<!https:\/\/)
are two negative lookbehinds that will get triggered at the same location inside the string (since lookarounds are non-consuming patterns) and - after making sure the location is at the word boundary due to \b
- they will fail the match if there is http://
or https://
immediately to the left of the current location.
Upvotes: 1
Reputation: 56
You can use negative lookahead, I think it is usually the quickest option.
It returns negative if contains the string you are excluding, like:
^(?!(http)).*
Upvotes: 0