user2568737
user2568737

Reputation: 29

SED - stripping certain tags from XML file

I have some xml files littered with tags like this:

<?xm-insertion_mark_start author="some_author" time="20050602T125956-0500"?>  

How would I strip such inserts? I've tried this to avail:

sed -e 's/<\?xm.*?\?>//g' in.xml > out.xml

Upvotes: 0

Views: 428

Answers (2)

Tomalak
Tomalak

Reputation: 338148

sed does not have reluctant ("non-greedy") quantifiers. Try this:

sed '/<?xm\([^?][^>]\)\+?>/d' in.xml > out.xml

EDIT: Of course you could use XSLT to safely remove the processing instruction (PI) from the XML.

This removes all PIs named xm-insertion_mark_start but leaves all remaining XML untouched.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="processing-instruction('xm-insertion_mark_start')" />
</xsl:stylesheet>

Use

<xsl:template match="processing-instruction()" />

if you want to remove all PIs regardless of their name.

You can use xsltproc(1) to apply the transformation to your XML on the command line.

Upvotes: 1

anubhava
anubhava

Reputation: 784958

Instead of substitute (s) use delete (d) command of sed:

Also use sed -i (inline) like this:

sed -i.bak '/<?xm.*?>/d' in.xml

Using grep:

grep -v '<\?xm.*\?>' in.xml > out.xml

Caution: shell utilities are not always the best tools for parsing & editing XML data.

Upvotes: 2

Related Questions