Reputation: 11
I’m trying to create a regular expression that can identify a sentence within a text that contains a specific word - ‘JavaScript’ in this case. My approach is to extract the part of the text preceding the sentence, and then isolate the sentence from the rest of the text.
To get the preceding part, I used this regular expression:
.*\.\s(?=.*?JavaScript)
This works as expected. The lookahead assertion finds the longest preceding part that does not include the sentence containing ‘JavaScript’. I used non-greedy mode in the assertion and greedy mode for matching, which seems appropriate for finding the longest part.
To get the sentence from the rest of the text, I used this regular expression:
(?<=(.*\.\s(?=.*?JavaScript))).*?\.\s
However, I found that the preceding part identified by the lookbehind assertion differs from the result of the first step, even though both use exactly the same regular expression. It seems as if different lengths of preceding parts are selected simultaneously.
I’m trying to understand what went wrong with my approach.
Here is the text I used in the screen shots. The third sentence is the one which should be extracted.
Edit the Expression & Text to see the matches.
Roll over matches or the expression for details.
PCRE & JavaScript flavors of RegEx are supported.
Validate your expression with Tests mode.
Upvotes: 0
Views: 56
Reputation: 20747
It's not clear whether your input is all on one line or multiline so here is a solution to extract any sentence which contains "JavaScript":
/[^.]*\bJavaScript\b[^.]*\./g
https://regex101.com/r/FECt9n/1
/ Start delimiter
[^.]* Optionally find anything not a period
\bJavaScript\b Word-boundaried "JavaScript"
[^.]*\. Optionally find anything not a period leading up to a period
/g Close delimiter with global flag
Upvotes: 0