Lahey
Lahey

Reputation: 221

Regex match includes unwanted character

I have a regex expression that is matching URLs in a string which are not between quotes. This is working great but I have a minor issue with it.

The part that is dealing with the quotes is capturing the first character (can also be a white space) before the URL (usually https).

Here is the regex expression:

/(?:^|[^"'])(ftp|http|https|file):\/\/[\S]+(\b|$)/gim

You can test it out and you will see this unwanted match happening in front of the URL (if you type anything in front of the URL of course).

How do I get the proper Full match?

Upvotes: 1

Views: 289

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

The non-capturing group (?:^|[^"']) is matching and consuming the char other than ' and " with the [^'"] negated character class. As that char is consumed, it is added to the whole match value. What a capturing group does not do is adding the matched substring to a separate memory buffer, and thus you cannot access it later after a match is found.

The usual solutions are:

Upvotes: 2

Related Questions