Ian W
Ian W

Reputation: 4767

Condtionally remove section of XML file

I am looking for a solution to this problem and suspect awk should provide a simple enough solution instead of my clumsy shell script.

I have an xml file consisting of multiple sections as shown below. I also have a list of values.

For each section <top_tag> ... </top_tag> where value_x is in my list, delete (ie:not print) the section <top_tag> ... </top_tag>

<xml>
<outer_tag>
   <top_tag>
      <tag>value_1</tag>
      <other_tags></other_tags>
   </top_tag>
   <top_tag>
      <tag>value_2</tag>
      <other_tags></other_tags>
   </top_tag>
    ...
   <top_tag>
      <tag>value_n</tag>
      <other_tags></other_tags>
   </top_tag>
</outer_tag>

Your suggestions are most appreciated.

Upvotes: 1

Views: 2784

Answers (2)

potong
potong

Reputation: 58371

This might work for you:

 sed -i '/<top_tag>/,/<\/top_tag>/!b;/<top_tag>/{h;d};H;/<\/top_tag/!d;x;/<tag>value.*<\/tag>/d' file

Upvotes: 2

toniedzwiedz
toniedzwiedz

Reputation: 18543

What you need here is not awk but XSLT, which was created specifically for this kind of tasks. It lets you transform an xml document into a different xml.

For an input much like yours:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="example.xsl"?>
<outer_tag>
   <top_tag>
      <tag>value_1</tag>
      <other_tags></other_tags>
   </top_tag>
   <top_tag>
      <tag>value_2</tag>
      <other_tags></other_tags>
   </top_tag>
   <top_tag>
      <tag>value_3</tag>
      <other_tags></other_tags>
   </top_tag>
   <top_tag>
      <tag>value_n</tag>
      <other_tags></other_tags>
   </top_tag>
</outer_tag>

The following XSLT removes all top_tag elements with value_3 by simply not copying them and ignoring their contents.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="*">
        <xsl:element name="{name()}">           
            <xsl:apply-templates select="child::node()"></xsl:apply-templates>
        </xsl:element>
    </xsl:template>

    <xsl:template match="top_tag[tag = 'value_3']">     
    </xsl:template>
</xsl:stylesheet

Every major programming language has at least a couple of libraries that can process an XML input according to an XSLT. Command line tools and UI-based applications (IDEs but not only those) can do it as well. Finally, web browsers can transform files using XSLT if you include the xsl file with a processing instruction like this:

<?xml-stylesheet type="text/xsl" href="example.xsl"?>

Upvotes: 2

Related Questions