Will WP
Will WP

Reputation: 1237

Remove all contents between two tags in a bunch of xml files

I have in excess of 500 xml files, all with a similar structure. Each has a <stream> tag and its corresponding </stream> tag, with many lines of text in between. Is there a way to quickly remove everything between the two tags (possibly including the tags themselves) without having to manually select, delete all the text (which is a lot)?

I use notepad to open these files but can use other software if needed.

Upvotes: 0

Views: 377

Answers (1)

Martin Honnen
Martin Honnen

Reputation: 167581

Use XSLT e.g.

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="stream"/>

</xsl:stylesheet>

There are various XSLT processors having a command line interface or you can use a few lines of Powershell (xslt.xsl is the above saved under that name) e.g.

$xslt = New-Object System.Xml.Xsl.XslCompiledTransform
$xslt.Load("xslt.xsl")
$xslt.Transform("input.xml", "output.xml")

Upvotes: 2

Related Questions