Reputation: 141
I am trying to extract text from a string, and have trouble with laziness/greediness.
In the example I want the piece of text to match <b>I want this piece</b>
, so my regex is non-greedy anything between <b>
and </b>
as long as it contains 'piece'.
The problem with my regex that the matched text includes <b>first</b>
.
var text = "<b>first</b> <b>I only want this piece</b>";
var regX = /<b>.*?piece.*?<\/b>/;
var matches = text.match(regX);
Matched text
"<b>first</b> <b>I only want this piece</b>"
Desired match
"<b>I only want this piece</b>"
Upvotes: 0
Views: 331
Reputation: 1476
This would work for excluding any html tags, and might be a little more robust, depending on how predictable your string is:
var regX = /<b>(?:(?!<[^>]*>).)*piece.*?<\/b>/
If you want to match newline characters, you can use \s\S in addition to the dot (.), e.g. [.\s\S]
:
var regX = /<b>(?:(?!<[^>]*>)[.\s\S])*piece[.\s\S]*?<\/b>/
Upvotes: 1
Reputation: 174766
Use a negated char class instead of the first .*?
.
var regX = /<b>[^<>]*?piece.*?<\/b>/;
Why?
Because the first <b>.*?piece
will match the first <b>
and it continues until it finds the text piece
and it won't care about the text present in-between. If you use [^<>]*?
, it would do a lazy match of matching any char but not of <
or >
character zero or more times.
Upvotes: 3