Reputation: 29
I have some xml files littered with tags like this:
<?xm-insertion_mark_start author="some_author" time="20050602T125956-0500"?>
How would I strip such inserts? I've tried this to avail:
sed -e 's/<\?xm.*?\?>//g' in.xml > out.xml
Upvotes: 0
Views: 428
Reputation: 338148
sed does not have reluctant ("non-greedy") quantifiers. Try this:
sed '/<?xm\([^?][^>]\)\+?>/d' in.xml > out.xml
EDIT: Of course you could use XSLT to safely remove the processing instruction (PI) from the XML.
This removes all PIs named xm-insertion_mark_start
but leaves all remaining XML untouched.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>
<xsl:template match="processing-instruction('xm-insertion_mark_start')" />
</xsl:stylesheet>
Use
<xsl:template match="processing-instruction()" />
if you want to remove all PIs regardless of their name.
You can use xsltproc(1)
to apply the transformation to your XML on the command line.
Upvotes: 1
Reputation: 784958
Instead of substitute (s) use delete (d) command of sed:
Also use sed -i
(inline) like this:
sed -i.bak '/<?xm.*?>/d' in.xml
Using grep:
grep -v '<\?xm.*\?>' in.xml > out.xml
Caution: shell utilities are not always the best tools for parsing & editing XML data.
Upvotes: 2