Noobie
Noobie

Reputation: 31

Negative lookahead Regex Issue

I started looking lookaheads and tried to create a simple example, but for some reason it's not working properly when I try using negative lookahead.

I have the following simple regex:

href="(.+?)"(?!\s)

and this string:

<a href="test.com">test</a> 
<a href="test.com" title="title">test</a>

Testing enviorment: https://regex101.com/r/JztPUe/1

I'm trying to take the url beween the href only if it's not followed by a space, but it seems that it doesn't undestand me, since it's getting the first and the second URL.

When I change it to a positive lookahead it's working as it should be and it takes only the second URL, but the negative one is not working as expected.

Can someone point where is my mistake?

Upvotes: 2

Views: 108

Answers (2)

x7ee1
x7ee1

Reputation: 17

With space href="\K(\S+)"\s\K demo

Without space href="\K(\S+)">\K demo


\K escapes string sequences.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520908

You should consider using an HTML parser instead of trying to do this with a regex. That being said, you could just phrase your regex by insisting that what follows the href clause is not a space:

href="([^"]*)"[^ ]

Demo

Your current regex:

href="(.+?)"(?!\s)

works as expected in Regex 101 when slightly rewritten as this:

href="([^"]*)"(?!\s)

Demo

The issue you were having appears to be caused by the flavor of regex in your demo not supporting the lazy dot (.+?). This is a Perl extension and is not supported by all engines.

Upvotes: 1

Related Questions