Reputation: 3
I have a string like this
<tag1>
<tag1>
any text
</tag1>
text
</tag1>
and I want to find a <tag1>
, that contains shortest text in this string.
I used the following regex <tag1>.*?</tag1>
, but instead of <tag1>any text</tag1>
i got <tag1> <tag1>any text</tag1>
. Here is the example.
Why it doesn't works and what am I doing wrong?
Upvotes: 0
Views: 107
Reputation: 4864
You can use this simple code to solve your specific problem :
<tag1>[^<]*</tag1>
Upvotes: 1
Reputation: 93086
It is not working, because it will start matching at the first <tag1>
and then match as least as possible, so ending at the first </tag1>
, resulting in "<tag1> <tag1>any text</tag1>
".
You can avoid matching tags by using a negated character class
<tag1>[^<>]*</tag1>
The other possibility is to use a negated lookahead assertion and match the next character only, if it is not the tag.
(<tag1>)((?!\1).)*?</tag1>
Upvotes: 0
Reputation: 9641
I would be able to help you if those tags were not nested inside themselves (the same tag).
It is generally a bad idea to do this type of thing with regex. You should get a proper parser to fit your requirements.
Upvotes: 0