Sean Scarfo
Sean Scarfo

Reputation: 1

How to search and replace XML tags and entries for 10,000 XML files

I need to update XML for 10,000+ files.

I'm a newbie programmer, so I'd prefer to work with something that can work out of the box if possible (existing solution?). If not, I'm not afraid to learn to and try new things. I am taking a course at college (Programming Logic) to get my feet wet, but of course this won't provide an immediate result.

All the files are in their own serial number based directory. Each file is called 83_XYZETC.xml

Each one of these XML files has two tags/contents that need to be searched and have all instances removed.

example:

<mediaFile>
content 123
</mediaFile>

<image>
image info 123
</image>

I also need to then reinsert a different tag/content within another tag. Example:

                  <track>
Need to insert>>  <action>UPDATE</action>
                  extra stuff etc 
                  more stuff
                  even more
                  </track>

Lastly I need to enter a string of text within a tag, but at the end.
example:

<right type="labelDownload">Y</right>
</track>

I'd appreciate any suggestions. Windows platform preferred. Thank you!

Upvotes: 0

Views: 1523

Answers (2)

innovimax
innovimax

Reputation: 560

Also did you have a look at XProc ?

Upvotes: 0

Michael Kay
Michael Kay

Reputation: 163468

XSLT allows you to express your transformation rules in a form fairly similar to your English description.

You start with a template rule that says "by default, when you hit an element, copy it and process its children":

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

You want rules for the mediaFile and image elements that say "delete this element" (i.e., when you encounter it, output nothing):

<xsl:template match="mediaFile | image"/>

And for the track element, if I understand you right, you want to add some standard content at the start and the end:

<xsl:template match="track">
  <xsl:copy>
    <action>UPDATE</action>
    <xsl:apply-templates/>
    <right type="labelDownload">Y</right>
  </xsl:copy>
</xsl:template>

That's all there is to the stylesheet, other than a boilerplate xsl:stylesheet element to wrap it all up.

Then you need to apply it to your 10000 input documents. You could do this with ant, but others would do it using a shell script, or there is also David Lee's xmlsh which is a special shell-like scripting language for XML processing, or you could be more enterprising and use XProc. Or you could write a little Java application. It really depends what you're most comfortable with. But if you don't want to learn yet another language, you can also do it within XSLT 2.0, though it's a little processor dependent. With Saxon, you can add a template rule:

<xsl:template name="main">
  <xsl:for-each select="collection('.?select=*.xml')">
    <xsl:result-document href="{tokenize(document-uri(.), '/')[last()]">
      <xsl:apply-templates/>
    </xsl:result-document>
  </xsl:for-each>
</xsl:template>

and then, having installed Saxon, all you need to run this with your current directory being the one containing the XML files:

java net.sf.saxon.Transform -xsl:stylesheet.xsl -it:main -o:../output/result.xml

Upvotes: 3

Related Questions