Reputation: 83
I have a regex to find url's in text:
^(?!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}?$
However it fails when it is surrounded by text:
https://regex101.com/r/0vZy6h/1
I can't seem to grasp why it's not working.
Upvotes: 1
Views: 353
Reputation: 627101
Possible reasons why the pattern does not work:
^
and $
make it match the entire string(?!:\/\/)
is a negative lookahead that fails the match if, immediately to the right of the current location, there is ://
substring. But [a-zA-Z0-9-_]+
means there can't be any ://
, so, you most probably wanted to fail the match if ://
is present to the left of the current location, i.e. you want a negative lookbehind, (?<!:\/\/)
.[a-zA-Z]{2,11}?
- matches 2 chars only if $
is removed since the {2,11}?
is a lazy quantifier and when such a pattern is at the end of the pattern it will always match the minimum char amount, here, 2.Use
(?<!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}
See the regex demo. Add \b
word boundaries if you need to match the substrings as whole words.
Note in Python regex there is no need to escape /
, you may replace (?<!:\/\/)
with (?<!://)
.
Upvotes: 1
Reputation: 5165
The spaces are not being matched. Try adding space to the character sets checking for leading or trailing text.
Upvotes: 0