OanaV.
OanaV.

Reputation: 81

Is there a regex to find text between two tags?

I have a text which contains some xml and i'm trying to match each group.

<DOCUMENT>
<TYPE>Some text 
<SEQUENCE>1
<FILENAME>page.htm
<DESCRIPTION>some text
<TEXT>
<HTML>
<HEAD>
</DOCUMENT>

<DOCUMENT>
<TYPE>Some text 2
<SEQUENCE>1
<FILENAME>page2.htm
<DESCRIPTION>some text 2
<TEXT>
<HTML>
<HEAD>
</DOCUMENT>

I tried to get each group of <DOCUMENT> ... </DOCUMENT>, but i keep getting the whole text not each group. I have tried with (<DOCUMENT>)([^&]*)(<\DOCUMENT>) and a few others, but couldn't get it in groups.

Upvotes: 0

Views: 40

Answers (1)

Chris Heald
Chris Heald

Reputation: 62688

In general, the form /start(.*?)end/ will work for PCRE non-greedy matching of text between start and end delimiters. You may need to turn on the multi-line flag if your text contains newlines. The ? is the ticket - it turns the match from a greedy (matching from the start token to the last end token) to non-greedy (matching from the start token to the first instance of the end token).

That said, if this is actually a markup document, remember the golden rule: don't parse XML with regex.

Upvotes: 1

Related Questions