Reputation: 3410
I have the following the text:
<def id="1">[<note>AA2</note>] Valer:<ex>asd</ex></def>
<def id="2">AWEs: [<note>DDD1</note>]:<ex>rfwc sdad</ex>[<note>CC#2</note>]:<ex>saq www</ex>[<note>POL1</note>]:<ex>Sasd.</ex></def>
<def id="3">Esd: [<note>AAA</note>]:<ex>qw wq.</ex>[<note>PS0</note>]:<ex>sad sadad.</ex></def>
<def id="4" type="L99">[<note>CARSF1</note>] asddds:<ex>ass www.</ex></def>
I'm trying to match when there's a [
immediately after the def
tag is opened.
I've this pattern:
<def\s.*?>\[<note>(.*?)<\/note>\](.*?):<ex>(.*?)<\/ex><\/def>
But it matches all lines and I'm not really sure why.
Here's the demo
Upvotes: 0
Views: 34
Reputation: 726479
Non-greedy means "consume as little as possible to make a successful match". If making a successful match requires consuming additional characters, non-greedy qualifier consumes as many characters as required, stopping as soon as possible.
In your case the non-greedy .*?
in the <def\s...>
part continues matching after the closing bracket >
, because otherwise there would be no match. On lines two and three it goes all the way to the second note, at which point it matches the rest of the string.
Here is how you can fix this problem:
<def\s[^>]*>\[<note>([^<]*)<\/note>\]([^<]*):<ex>([^<]*)<\/ex><\/def>
The idea is to replace all non-greedy expressions with greedy expressions requiring an explicit stop (i.e. <
or >
, depending on the context).
Upvotes: 1