kevin
kevin

Reputation: 309

Regular Expression to match parent and sub node

I want to development a regular expresion to match the tag :

<claim-text>aaaaaaa
    <claim-text>bbbbbbb</claim-text>
    <claim-text>ccccccc</claim-text>
</claim-text>

I tried

<claim-text>(.*)</claim-text>

But, only bbbbbbb and ccccccc can be matched. Can I get some help to cover aaaaaaa also?

Thanks

Upvotes: 1

Views: 122

Answers (2)

JGNI
JGNI

Reputation: 4013

Do not under any circumstances try to parse HTML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.

Use an HTML parsing library see this page for some ways to do it.

Upvotes: 1

Christoph Herold
Christoph Herold

Reputation: 1809

For a generic solution with any depth, you will at least need a stack, which not available for most regular expression implementation. However, if you know the structure will only have the depth you specified, you could use something like this:

<claim-text>([^<\r\n]*)

You can see a working example here: https://regex101.com/r/kbDbwF/1

It will search for your opening tag, and then find anything up to the next opening or closing tag [^<], or to the next line break [^\r\n]. I have combined both character classes to one definition [^<\r\n]. However, this is not a general solution!

Upvotes: 1

Related Questions