Reputation: 221
I have a regex expression that is matching URLs in a string which are not between quotes. This is working great but I have a minor issue with it.
The part that is dealing with the quotes is capturing the first character (can also be a white space) before the URL (usually https).
Here is the regex expression:
/(?:^|[^"'])(ftp|http|https|file):\/\/[\S]+(\b|$)/gim
You can test it out and you will see this unwanted match happening in front of the URL (if you type anything in front of the URL of course).
How do I get the proper Full match?
Upvotes: 1
Views: 289
Reputation: 626689
The non-capturing group (?:^|[^"'])
is matching and consuming the char other than '
and "
with the [^'"]
negated character class. As that char is consumed, it is added to the whole match value. What a capturing group does not do is adding the matched substring to a separate memory buffer, and thus you cannot access it later after a match is found.
The usual solutions are:
(?:^|[^"'])((?:ftp|https?|file):\/\/\S+)(?:\b|$)
pattern)(?<!["'])
negative lookbehind that only matches a location that is not immediately preceded with '
or "
: (?<!["'])(?:ftp|https?|file):\/\/\S+(?:\b|$)
.Upvotes: 2