Reputation: 81
I have a text which contains some xml and i'm trying to match each group.
<DOCUMENT>
<TYPE>Some text
<SEQUENCE>1
<FILENAME>page.htm
<DESCRIPTION>some text
<TEXT>
<HTML>
<HEAD>
</DOCUMENT>
<DOCUMENT>
<TYPE>Some text 2
<SEQUENCE>1
<FILENAME>page2.htm
<DESCRIPTION>some text 2
<TEXT>
<HTML>
<HEAD>
</DOCUMENT>
I tried to get each group of <DOCUMENT> ... </DOCUMENT>
, but i keep getting the whole text not each group.
I have tried with (<DOCUMENT>)([^&]*)(<\DOCUMENT>)
and a few others, but couldn't get it in groups.
Upvotes: 0
Views: 40
Reputation: 62688
In general, the form /start(.*?)end/
will work for PCRE non-greedy matching of text between start and end delimiters. You may need to turn on the multi-line flag if your text contains newlines. The ?
is the ticket - it turns the match from a greedy (matching from the start token to the last end token) to non-greedy (matching from the start token to the first instance of the end token).
That said, if this is actually a markup document, remember the golden rule: don't parse XML with regex.
Upvotes: 1