Reputation: 83

Regex in middle of text doesn't match

I have a regex to find url's in text:

^(?!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}?$

However it fails when it is surrounded by text:

I can't seem to grasp why it's not working.

Upvotes: 1

Answers (2)

Reputation: 627101

Possible reasons why the pattern does not work:

^ and $ make it match the entire string
(?!:\/\/) is a negative lookahead that fails the match if, immediately to the right of the current location, there is :// substring. But [a-zA-Z0-9-_]+ means there can't be any ://, so, you most probably wanted to fail the match if :// is present to the left of the current location, i.e. you want a negative lookbehind, (?<!:\/\/).
[a-zA-Z]{2,11}? - matches 2 chars only if $ is removed since the {2,11}? is a lazy quantifier and when such a pattern is at the end of the pattern it will always match the minimum char amount, here, 2.

Use

(?<!:\/\/)([a-zA-Z0-9-_]+\.)*[a-zA-Z0-9][a-zA-Z0-9-_]+\.[a-zA-Z]{2,11}

See the regex demo. Add \b word boundaries if you need to match the substrings as whole words.

Note in Python regex there is no need to escape /, you may replace (?<!:\/\/) with (?<!://).

Upvotes: 1

Reputation: 5165

The spaces are not being matched. Try adding space to the character sets checking for leading or trailing text.

Upvotes: 0