Reputation: 31
I started looking lookaheads and tried to create a simple example, but for some reason it's not working properly when I try using negative lookahead.
I have the following simple regex:
href="(.+?)"(?!\s)
and this string:
<a href="test.com">test</a>
<a href="test.com" title="title">test</a>
Testing enviorment: https://regex101.com/r/JztPUe/1
I'm trying to take the url beween the href only if it's not followed by a space, but it seems that it doesn't undestand me, since it's getting the first and the second URL.
When I change it to a positive lookahead it's working as it should be and it takes only the second URL, but the negative one is not working as expected.
Can someone point where is my mistake?
Upvotes: 2
Views: 108
Reputation: 17
With space href="\K(\S+)"\s\K
demo
Without space href="\K(\S+)">\K
demo
\K
escapes string sequences.
Upvotes: 1
Reputation: 520908
You should consider using an HTML parser instead of trying to do this with a regex. That being said, you could just phrase your regex by insisting that what follows the href
clause is not a space:
href="([^"]*)"[^ ]
Your current regex:
href="(.+?)"(?!\s)
works as expected in Regex 101 when slightly rewritten as this:
href="([^"]*)"(?!\s)
The issue you were having appears to be caused by the flavor of regex in your demo not supporting the lazy dot (.+?)
. This is a Perl extension and is not supported by all engines.
Upvotes: 1