Reputation: 27221
I have an xml file that I want to configure using a bash script. For example if I had this xml:
<a>
<b>
<bb>
<yyy>
Bla
</yyy>
</bb>
</b>
<c>
<cc>
Something
</cc>
</c>
<d>
bla
</d>
</a>
(confidential info removed)
I would like to write a bash script that will remove section <b>
(or comment it) but keep the rest of the xml intact. I am pretty new the the whole scripting thing. I was wondering if anyone could give me a hint as to what I should look into.
I was thinking that sed could be used except sed is a line editor. I think it would be easy to remove the <b>
tags however I am unsure if sed would be able to remove all the text between the <b>
tags.
I will also need to write a script to add back the deleted section.
Upvotes: 25
Views: 46616
Reputation: 1
sed -i '/<b>/,/<\/b>/d' foo.xml
Will this work if b tag has a value defined as well
in about HTML, b tag starts as <b id="Test Step">
Upvotes: 0
Reputation: 289
Using xmlstarlet:
#xmlstarlet ed -d "/a/b" file.xml > tmp.xml
xmlstarlet ed -d "//b" file.xml > tmp.xml
mv tmp.xml file.xml
Upvotes: 16
Reputation: 342273
@OP, you can use awk eg
$ cat file
<a>
some text before <b>
<bb>
<yyy>
Bla
</yyy>
</bb>
</b> some text after
<c>
<cc>
Something
</cc>
</c>
<d>
bla
</d>
</a>
$ awk 'BEGIN{RS="</b>"}/<b>/{gsub(/<b>.*/,"")}1' file
<a>
some text before
some text after
<c>
<cc>
Something
</cc>
</c>
<d>
bla
</d>
</a>
Upvotes: 3
Reputation: 690
This would not be difficult to do in sed, as sed also works on ranges.
Try this (assuming xml is in a file named foo.xml):
sed -i '/<b>/,/<\/b>/d' foo.xml
-i will write the change into the original file (use -i.bak to keep a backup copy of the original)
This sed command will perform an action d (delete) on all of the lines specified by the range
# all of the lines between a line that matches <b>
# and the next line that matches <\/b>, inclusive
/<b>/,/<\/b>/
So, in plain English, this command will delete all of the lines between and including the line with <b> and the line with </b>
If you'd rather comment out the lines, try one of these:
# block comment
sed -i 's/<b>/<!-- <b>/; s/<\/b>/<\/b> -->/' foo.xml
# comment out every line in the range
sed -i '/<b>/,/<\/b>/s/.*/<!-- & -->/' foo.xml
Upvotes: 31
Reputation: 66714
You can use an XSLT such as this that is a modified identity transform. It copies all of the content by default, and has an empty template for b
that does nothing(effectively deleting from output):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!--Identity transform copies all items by default -->
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!--Empty template to match on b elements and prevent it from being copied to output -->
<xsl:template match="b"/>
</xsl:stylesheet>
Create a bash script that executes the transform using Java and the Xalan commandline utility like this:
java org.apache.xalan.xslt.Process -IN foo.xml -XSL foo.xsl -OUT foo.out
The result is this:
<?xml version="1.0" encoding="UTF-16"?><a><c><cc>
Something
</cc></c><d>
bla
</d></a>
EDIT: if you would prefer to have the b
commented out, to make it easier to put back, then use this stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<!--Identity transform copies all items by default -->
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!--Match on b element, wrap in a comment and construct text representing XML structure by applying templates in "comment" mode -->
<xsl:template match="b">
<xsl:comment>
<xsl:apply-templates select="self::*" mode="comment" />
</xsl:comment>
</xsl:template>
<xsl:template match="*" mode="comment">
<xsl:value-of select="'<'"/>
<xsl:value-of select="name()"/>
<xsl:value-of select="'>'"/>
<xsl:apply-templates select="@*|node()" mode="comment" />
<xsl:value-of select="'</'"/>
<xsl:value-of select="name()"/>
<xsl:value-of select="'>'"/>
</xsl:template>
<xsl:template match="text()" mode="comment">
<xsl:value-of select="."/>
</xsl:template>
<xsl:template match="@*" mode="comment">
<xsl:value-of select="name()"/>
<xsl:text>="</xsl:text>
<xsl:value-of select="."/>
<xsl:text>" </xsl:text>
</xsl:template>
</xsl:stylesheet>
It produces this output:
<?xml version="1.0" encoding="UTF-16"?><a><!--<b><bb><yyy>
Bla
</yyy></bb></b>--><c><cc>
Something
</cc></c><d>
bla
</d></a>
Upvotes: 10
Reputation: 6163
If you want the most appropriate replacement for sed
for XML data, it would be an XSLT processor. Like sed
it's a complex language but specialized for the task of XML-to-anything transformations.
On the other hand, this does seem to be the point at which I would seriously consider switching to a real programming language, like Python.
Upvotes: 6