WelcomeTo
WelcomeTo

Reputation: 20571

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.

Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.

So for capturing URL I formulated this 2 rules:

To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).

I end up with this Regex:

.+(?<=window\.location\.redirect\(\"?=\"\))

It doesn't work. I'm not even sure that it legal to mix both rules like I did.

Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.

Upvotes: 1

Views: 87

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.

If it is JS, you just cannot use a lookbehind as its regex engine does not support them.

Instead, use a capturing group around the unknown part you want to get:

/window\.location\.redirect\("([^"]*)"\)/

or

/window\.location\.redirect\("(.*?)"\)/

See the regex demo

No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.

The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Upvotes: 1

Related Questions