Daniel
Daniel

Reputation: 73

Regular expression ignore string if starts with specific substring

I need to find with the regular expression domain names that don't start with the string "http". For example:

I found a regex that almost got this:

(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}

But it also detects "https://domain1.com"

Example given:

https://regex101.com/r/DjDBrx/1/

In this example I want to avoid "https://domain1.com"

Any help would be gratefully appreciated.

Upvotes: 2

Views: 941

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

You can use a word boundary coupled with two negative lookbehinds:

\b(?<!http:\/\/)(?<!https:\/\/)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                ^^

The (?<!http:\/\/)(?<!https:\/\/) are two negative lookbehinds that will get triggered at the same location inside the string (since lookarounds are non-consuming patterns) and - after making sure the location is at the word boundary due to \b - they will fail the match if there is http:// or https:// immediately to the left of the current location.

Upvotes: 1

Themathix
Themathix

Reputation: 56

You can use negative lookahead, I think it is usually the quickest option. It returns negative if contains the string you are excluding, like: ^(?!(http)).*

Upvotes: 0

Related Questions