user3193986
user3193986

Reputation: 3

Non-greedy match does not work

I have a string like this

<tag1>
    <tag1>
        any text
    </tag1>
    text
</tag1>

and I want to find a <tag1>, that contains shortest text in this string.

I used the following regex <tag1>.*?</tag1>, but instead of <tag1>any text</tag1> i got <tag1> <tag1>any text</tag1>. Here is the example.

Why it doesn't works and what am I doing wrong?

Upvotes: 0

Views: 107

Answers (3)

Sujith PS
Sujith PS

Reputation: 4864

You can use this simple code to solve your specific problem :

<tag1>[^<]*</tag1>

Upvotes: 1

stema
stema

Reputation: 93086

It is not working, because it will start matching at the first <tag1> and then match as least as possible, so ending at the first </tag1>, resulting in "<tag1> <tag1>any text</tag1>".

You can avoid matching tags by using a negated character class

<tag1>[^<>]*</tag1>

See it on Regexr.

The other possibility is to use a negated lookahead assertion and match the next character only, if it is not the tag.

(<tag1>)((?!\1).)*?</tag1>

See it on Regexr

Upvotes: 0

Vasili Syrakis
Vasili Syrakis

Reputation: 9641

I would be able to help you if those tags were not nested inside themselves (the same tag).

It is generally a bad idea to do this type of thing with regex. You should get a proper parser to fit your requirements.

Upvotes: 0

Related Questions