Cornwell
Cornwell

Reputation: 3410

Regex match being greedy after using non-greedy operator

I have the following the text:

<def id="1">[<note>AA2</note>] Valer:<ex>asd</ex></def>
<def id="2">AWEs: [<note>DDD1</note>]:<ex>rfwc sdad</ex>[<note>CC#2</note>]:<ex>saq www</ex>[<note>POL1</note>]:<ex>Sasd.</ex></def>
<def id="3">Esd: [<note>AAA</note>]:<ex>qw wq.</ex>[<note>PS0</note>]:<ex>sad sadad.</ex></def>
<def id="4" type="L99">[<note>CARSF1</note>] asddds:<ex>ass www.</ex></def>

I'm trying to match when there's a [ immediately after the def tag is opened.

I've this pattern:

<def\s.*?>\[<note>(.*?)<\/note>\](.*?):<ex>(.*?)<\/ex><\/def>

But it matches all lines and I'm not really sure why.

Here's the demo

Upvotes: 0

Views: 34

Answers (2)

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726479

Non-greedy means "consume as little as possible to make a successful match". If making a successful match requires consuming additional characters, non-greedy qualifier consumes as many characters as required, stopping as soon as possible.

In your case the non-greedy .*? in the <def\s...> part continues matching after the closing bracket >, because otherwise there would be no match. On lines two and three it goes all the way to the second note, at which point it matches the rest of the string.

Here is how you can fix this problem:

<def\s[^>]*>\[<note>([^<]*)<\/note>\]([^<]*):<ex>([^<]*)<\/ex><\/def>

Demo.

The idea is to replace all non-greedy expressions with greedy expressions requiring an explicit stop (i.e. < or >, depending on the context).

Upvotes: 1

Vincent Biragnet
Vincent Biragnet

Reputation: 2998

your first .* should be [^>]*

Upvotes: 1

Related Questions