Danis Lui
Danis Lui

Reputation: 1

remove nested element using regular expression

I am new to regex. I want to only capture the text portion from <firstpar> or to remove all <asmbly> with all its children nodes and values. Can anyone show me how to do that. The following is the snap shot of the xml fiel. thanks.

<?xml version="1.0" encoding="UTF-8"?>
<firstpar>
    <thumbcred>Sample 1 thumbcred</thumbcred>
    <asmbly>
       <caption>
           <p><work ty="drawing">Two Fabulous Animals</work>Sample 1 <e> sample 1caption </e></p>
        </caption>
        <credit>Paul Miller/AP</credit>
        <asset id="126099" hgt="450" wdth="289" tmstp="24-OCT-08"
            bintype="2" filename="images/sample126099.jpg" source="eb" bighgt="1600"
            bigwdth="1029" bigfilename="botany003.jpg"
            bigdeployfullfilename="/eb-media/99/126099-050-CAD1EF0A.jpg"
        />

        <copyright>Copyright © 1994-2013 Encyclopædia Britannica,  Inc.</copyright>
    </asmbly>

Sample firstpar text <e>Sample e</e> just some
text <sub>sample sub </sub><e>sample e text again</e> more text with sup sub e. 

    </firstpar>

Upvotes: 0

Views: 139

Answers (1)

Unfortunately, one of the known limitations of regex is that it does not handle nesting

You can and should use whatever XML parser is available in whatever language you're using.


If you have a very specifically formed piece of XML, and a very specific goal, than it is possible to use regex to perform some operations on it, but once you try to apply your regex to a non-specific piece of xml, it will be unable to handle it.

Upvotes: 2

Related Questions