Regular expression: lookahead assertion not works as expected with lookbehind assertion

Question

I’m trying to create a regular expression that can identify a sentence within a text that contains a specific word - ‘JavaScript’ in this case. My approach is to extract the part of the text preceding the sentence, and then isolate the sentence from the rest of the text.

To get the preceding part, I used this regular expression:

.*\.\s(?=.*?JavaScript)

Screen shot of the above with matched example text

This works as expected. The lookahead assertion finds the longest preceding part that does not include the sentence containing ‘JavaScript’. I used non-greedy mode in the assertion and greedy mode for matching, which seems appropriate for finding the longest part.

To get the sentence from the rest of the text, I used this regular expression:

(?<=(.*\.\s(?=.*?JavaScript))).*?\.\s

Another screen shot; matches the desired sentence and the one before

Yet one more; only matches first four characters of the previous sentence!

However, I found that the preceding part identified by the lookbehind assertion differs from the result of the first step, even though both use exactly the same regular expression. It seems as if different lengths of preceding parts are selected simultaneously.

I’m trying to understand what went wrong with my approach.

Here is the text I used in the screen shots. The third sentence is the one which should be extracted.

Edit the Expression & Text to see the matches.
Roll over matches or the expression for details.
PCRE & JavaScript flavors of RegEx are supported.
Validate your expression with Tests mode.

Regular expression: lookahead assertion not works as expected with lookbehind assertion

Answers (1)

Related Questions