nikopartanen
nikopartanen

Reputation: 617

Using R to modify XML node order (or adding a new node into a specific place)

I know this would probably be easiest to do with XSL, but I need to do it in R. I I have following kind of XML file:

<xml>
  <info>Some info here</info>
  <node id='1'/>
  <node id='2'/>
  <node id='3'/>
  <comment>Some comment here</comment>
</xml>

If I would need to add a new node, I'd do it like this:

library(XML)

xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"

doc <- xmlParse(xml)

doc <- getNodeSet(doc, "//xml")

new_node <- newXMLNode("node", attrs = c(id = 4))

# Is there a way to specify here where to insert the node?

doc <- addChildren(doc[[1]], new_node)

# Or can one sort it somehow once it is there?

However, this adds the new node to the end of the document, like this:

<xml>
  <info>Some info here</info>
  <node id="1"/>
  <node id="2"/>
  <node id="3"/>
  <comment>Some comment here</comment>
  <node id="4"/>
</xml> 

I would need to maintain the original structure, which contains the elements in a specific order. For the sake of clarity, here is what I need:

<xml>
  <info>Some info here</info>
  <node id="1"/>
  <node id="2"/>
  <node id="3"/>
  <node id="4"/>
  <comment>Some comment here</comment>
</xml>

As I tried to explain in the title, I think there are two solutions to this problem. One would be to add the new node between the last <node> and <comment>. Or after inserting it to the end specify the right order and sort the document with that.

Upvotes: 2

Views: 890

Answers (2)

Parfait
Parfait

Reputation: 107652

Here is an XSLT solution in R in case your modification becomes more complex (or customized) and for future readers. Because R does not have yet a universal XSLT library, it can use an installed XSLT processor via the RCOMClient or command line. Since most general purpose languages maintain XSLT libraries (Java, C#, PHP, Perl, Python, VB) R can leverage them accordingly.

Below examples are VBA and Python approaches. VBA uses the MSXML object which comes with a Microsoft Office installation. Additionally, the open source Python runs via command line using its lxml module.

R (using RDCOMClient, assuming you have Microsoft Office installed on PC; can easily translate as an Excel macro)

library(XML)
library(RDCOMClient)

# ADD NEEDED NODE(S)
xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"
doc <- xmlParse(xml)
doc <- getNodeSet(doc, "//xml")
new_node <- newXMLNode("node", attrs = c(id = 4))
doc <- addChildren(doc[[1]], new_node)
xmlstr <- saveXML(xmlstr)

# INITIALIZE MSXML OBJECTS
xmlfile = COMCreate("MSXML2.DOMDocument")
xslfile = COMCreate("MSXML2.DOMDocument")
newxmlfile = COMCreate("MSXML2.DOMDocument")

# XSLT TRANSFORMATION
xslstr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
  <xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">
  <xsl:output omit-xml-declaration=\"yes\" indent=\"yes\"/>
  <xsl:strip-space elements=\"*\"/>

  <xsl:template match=\"xml\">
  <xml>
    <xsl:copy-of select=\"info\"/>  
    <xsl:copy-of select=\"node\"/>
    <xsl:copy-of select=\"comment\"/>
  </xml>
  </xsl:template>  
  </xsl:stylesheet>"

newxmlstr = "Output_R.xml"

# LOADING XML & XSLT FILES
xmlfile.async = FALSE
xmlfile$LoadXML(xmlstr)

xslfile.async = FALSE
xslfile$LoadXML(xslstr)

# TRANSFORMING XML FILE USING XLST INTO NEW FILE
xmlfile$transformNodeToObject(xslfile, newxmlfile)
newxmlfile$Save(newxmlstr)

# SHOW OUTPUT
doc<-xmlParse(newxmlstr)
doc

# UNINITIALIZE MSXML OBJECTS
xmlfile <- NULL
xslfile <- NULL
newxmlfile <- NULL

R (calling below Python script via command line, assuming 'python' is an environment variable in PATH for PC machines)

library(XML)

# ADD NEEDED NODE(S)
xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"
doc <- xmlParse(xml)
doc <- getNodeSet(doc, "//xml")
new_node <- newXMLNode("node", attrs = c(id = 4))
doc <- addChildren(doc[[1]], new_node)
out <- saveXML(doc, file="NewNode.xml")

# COMMAND LINE CALL
shell(paste("python", shQuote("C:\\Path\\To\\PythonScript.py")))

Python (parsing from file the outputted XML from R and outputting resultant xml to file)

import os
import lxml.etree as ET

cd = os.path.dirname(os.path.abspath(__file__))

xsl ="<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">\
      <xsl:output omit-xml-declaration=\"yes\" indent=\"yes\"/>\
      <xsl:strip-space elements=\"*\"/>\
      <xsl:template match=\"xml\">\
        <xml>\
          <xsl:copy-of select=\"info\"/> \
          <xsl:copy-of select=\"node\"/> \
          <xsl:copy-of select=\"comment\"/> \
        </xml>\
      </xsl:template>\
      </xsl:stylesheet>"""

# PARSING XML AND XSL    
dom = ET.parse(os.path.join(cd,'NewNode.xml'))
xslobj = ET.fromstring(xsl)
transform = ET.XSLT(xslobj)
newdom = transform(dom)

# OUTPUT TO FILE
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)
# SHOW OUTPUT
print(tree_out)

xmlfile = open(os.path.join(cd, 'Output_py.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

Final Output (of both approaches)

<?xml version="1.0"?>
<xml>
  <info>Some info here</info>
  <node id="1"/>
  <node id="2"/>
  <node id="3"/>
  <node id="4"/>
  <comment>Some comment here</comment>
</xml>

While all the above may seem complex, just remember programming languages are tools that do particular things well including general-purpose (Java, C, Python) and special-purpose (SQL, XSLT) languages. R is really a statistical computing language and may not be the best tool to parse and transform XML documents. A good handyman doesn't do every job with just a hammer!

Upvotes: 1

nicola
nicola

Reputation: 24490

You can use xmlChildren<- and reorder the elements through standard R subsetting:

xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"
doc <- xmlParse(xml)
doc <- getNodeSet(doc, "//xml")
new_node <- newXMLNode("node", attrs = c(id = 4))
xmlChildren(doc[[1]])<-c(xmlChildren(doc[[1]]),node=new_node)[c(1:4,6,5)]
doc[[1]]
#<xml>
#   <info>Some info here</info> 
#   <node id="1"/>
#   <node id="2"/>
#   <node id="3"/>
#   <node id="4"/>
#   <comment>Some comment here</comment>
#</xml> 

Upvotes: 1

Related Questions