Reputation: 617
I know this would probably be easiest to do with XSL, but I need to do it in R. I I have following kind of XML file:
<xml>
<info>Some info here</info>
<node id='1'/>
<node id='2'/>
<node id='3'/>
<comment>Some comment here</comment>
</xml>
If I would need to add a new node, I'd do it like this:
library(XML)
xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"
doc <- xmlParse(xml)
doc <- getNodeSet(doc, "//xml")
new_node <- newXMLNode("node", attrs = c(id = 4))
# Is there a way to specify here where to insert the node?
doc <- addChildren(doc[[1]], new_node)
# Or can one sort it somehow once it is there?
However, this adds the new node to the end of the document, like this:
<xml>
<info>Some info here</info>
<node id="1"/>
<node id="2"/>
<node id="3"/>
<comment>Some comment here</comment>
<node id="4"/>
</xml>
I would need to maintain the original structure, which contains the elements in a specific order. For the sake of clarity, here is what I need:
<xml>
<info>Some info here</info>
<node id="1"/>
<node id="2"/>
<node id="3"/>
<node id="4"/>
<comment>Some comment here</comment>
</xml>
As I tried to explain in the title, I think there are two solutions to this problem. One would be to add the new node between the last <node>
and <comment>
. Or after inserting it to the end specify the right order and sort the document with that.
Upvotes: 2
Views: 890
Reputation: 107652
Here is an XSLT solution in R in case your modification becomes more complex (or customized) and for future readers. Because R does not have yet a universal XSLT library, it can use an installed XSLT processor via the RCOMClient or command line. Since most general purpose languages maintain XSLT libraries (Java, C#, PHP, Perl, Python, VB) R can leverage them accordingly.
Below examples are VBA and Python approaches. VBA uses the MSXML object which comes with a Microsoft Office installation. Additionally, the open source Python runs via command line using its lxml module.
R (using RDCOMClient, assuming you have Microsoft Office installed on PC; can easily translate as an Excel macro)
library(XML)
library(RDCOMClient)
# ADD NEEDED NODE(S)
xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"
doc <- xmlParse(xml)
doc <- getNodeSet(doc, "//xml")
new_node <- newXMLNode("node", attrs = c(id = 4))
doc <- addChildren(doc[[1]], new_node)
xmlstr <- saveXML(xmlstr)
# INITIALIZE MSXML OBJECTS
xmlfile = COMCreate("MSXML2.DOMDocument")
xslfile = COMCreate("MSXML2.DOMDocument")
newxmlfile = COMCreate("MSXML2.DOMDocument")
# XSLT TRANSFORMATION
xslstr = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">
<xsl:output omit-xml-declaration=\"yes\" indent=\"yes\"/>
<xsl:strip-space elements=\"*\"/>
<xsl:template match=\"xml\">
<xml>
<xsl:copy-of select=\"info\"/>
<xsl:copy-of select=\"node\"/>
<xsl:copy-of select=\"comment\"/>
</xml>
</xsl:template>
</xsl:stylesheet>"
newxmlstr = "Output_R.xml"
# LOADING XML & XSLT FILES
xmlfile.async = FALSE
xmlfile$LoadXML(xmlstr)
xslfile.async = FALSE
xslfile$LoadXML(xslstr)
# TRANSFORMING XML FILE USING XLST INTO NEW FILE
xmlfile$transformNodeToObject(xslfile, newxmlfile)
newxmlfile$Save(newxmlstr)
# SHOW OUTPUT
doc<-xmlParse(newxmlstr)
doc
# UNINITIALIZE MSXML OBJECTS
xmlfile <- NULL
xslfile <- NULL
newxmlfile <- NULL
R (calling below Python script via command line, assuming 'python' is an environment variable in PATH for PC machines)
library(XML)
# ADD NEEDED NODE(S)
xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"
doc <- xmlParse(xml)
doc <- getNodeSet(doc, "//xml")
new_node <- newXMLNode("node", attrs = c(id = 4))
doc <- addChildren(doc[[1]], new_node)
out <- saveXML(doc, file="NewNode.xml")
# COMMAND LINE CALL
shell(paste("python", shQuote("C:\\Path\\To\\PythonScript.py")))
Python (parsing from file the outputted XML from R and outputting resultant xml to file)
import os
import lxml.etree as ET
cd = os.path.dirname(os.path.abspath(__file__))
xsl ="<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">\
<xsl:output omit-xml-declaration=\"yes\" indent=\"yes\"/>\
<xsl:strip-space elements=\"*\"/>\
<xsl:template match=\"xml\">\
<xml>\
<xsl:copy-of select=\"info\"/> \
<xsl:copy-of select=\"node\"/> \
<xsl:copy-of select=\"comment\"/> \
</xml>\
</xsl:template>\
</xsl:stylesheet>"""
# PARSING XML AND XSL
dom = ET.parse(os.path.join(cd,'NewNode.xml'))
xslobj = ET.fromstring(xsl)
transform = ET.XSLT(xslobj)
newdom = transform(dom)
# OUTPUT TO FILE
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
# SHOW OUTPUT
print(tree_out)
xmlfile = open(os.path.join(cd, 'Output_py.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
Final Output (of both approaches)
<?xml version="1.0"?>
<xml>
<info>Some info here</info>
<node id="1"/>
<node id="2"/>
<node id="3"/>
<node id="4"/>
<comment>Some comment here</comment>
</xml>
While all the above may seem complex, just remember programming languages are tools that do particular things well including general-purpose (Java, C, Python) and special-purpose (SQL, XSLT) languages. R is really a statistical computing language and may not be the best tool to parse and transform XML documents. A good handyman doesn't do every job with just a hammer!
Upvotes: 1
Reputation: 24490
You can use xmlChildren<-
and reorder the elements through standard R subsetting:
xml <- "<xml><info>Some info here</info><node id='1'/><node id='2'/><node id='3'/><comment>Some comment here</comment></xml>"
doc <- xmlParse(xml)
doc <- getNodeSet(doc, "//xml")
new_node <- newXMLNode("node", attrs = c(id = 4))
xmlChildren(doc[[1]])<-c(xmlChildren(doc[[1]]),node=new_node)[c(1:4,6,5)]
doc[[1]]
#<xml>
# <info>Some info here</info>
# <node id="1"/>
# <node id="2"/>
# <node id="3"/>
# <node id="4"/>
# <comment>Some comment here</comment>
#</xml>
Upvotes: 1