Peter Morris
Peter Morris

Reputation: 23254

Using RegEx to rename XML tags

I am importing lots of employees from a single XML file. Each employee has a section called <officeData> and a section called <personalData>. The children of those two nodes look exactly the same and have the same name <dataItem>.

I want to use change the elements within <personalData> to <personalDataItem> and obviously cannot use a global search/replace because of the children of the <officeData> node. If I use a look back / look ahead to check I am within <personalData> it will find that tag for the previous or next employee.

Is there anyway I can specify a regex pattern to only rename the multiple children within a specific parent XML node?

Upvotes: 0

Views: 649

Answers (2)

zx81
zx81

Reputation: 41848

This situation sounds straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc.

With all the disclaimers about using regex to parse xml, here is a simple way to do it.

Here's our simple regex:

<officeData>.*?</officeData>|(dataItem>)

The left side of the alternation matches complete officeData tags. We will ignore these matches. The right side matches and captures dataItem> to Group 1, and we know they are the right dataItem> because they were not matched by the expression on the left.

On the online demo, notice how only the correct dataItem> are highlighted and captured to Group 1, as shown in the bottom right panel.

In your language, in the replacement function, you just look if Group 1 capture is set. If so, you replace the match to personalDataItem>. If not, you replace the match with itself (i.e., no change).

This is a straightforward task, but, depending on your language, you may be able to find code samples to do this Group 1 examination in the referenced article.

Reference

  1. How to match (or replace) a pattern except in situations s1, s2, s3...
  2. How to match a pattern unless...

Upvotes: 1

Ian Roberts
Ian Roberts

Reputation: 122414

This is not a job for a regular expression, but it would be simple using an XSLT stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <!-- copy everything unchanged ... -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:template>

  <!-- ... except dataItem inside personalData, which we rename -->
  <xsl:template match="personalData/dataItem">
    <personalDataItem>
      <xsl:apply-templates select="@*|node()" />
    </personalDataItem>
  </xsl:template>
</xsl:stylesheet>

Upvotes: 4

Related Questions