user1377963
user1377963

Reputation: 115

Working with XML nodes in "R"

Given the following XML file:

<XML>
  <A>
    <B>
      <ID>1</ID>
    </B>
    <C>
      <D>10</D>
      <D>20</D>
    </C>
  </A>
  <A>
    <B>
      <ID>2</ID>
    </B>
    <C>
      <D>30</D>
      <D>50</D>
    </C>
  </A>
</XML>

With the following R code I can read in the XML file:

library(XML)
xmlobj <- xmlTreeParse("my_file.xml", useInternalNodes = TRUE)

First, I would like to get a list of the XML nodes "A". I can do this with

node_a <- xpathSApply(doc = xmlobj, path = "//A", xmlChildren)

and the result (node_a) looks like this:

  [,1] [,2]
B ?    ?   
C ?    ?   

In a second step I would like to call a function on each of the XML Nodes in the list extracted in step1 returning a list of XML Nodes "D". I tried to get the children of "C" for the first "A" element in the list from step one:

xmlChildren(asXMLNode(node_a["C",1]))

But the result is:

named list()
attr(,"class")
[1] "XMLNodeList"

Finally, I would like to have the values of D separately for each A (one list of D values for A with ID 1 and one list of D values for A with ID 2).

Or in other words, I want to get a list with the values of all D elements which are part of element A with ID 1 and another list with the values of all D elements which are part of element A with ID 2.

Upvotes: 1

Views: 1036

Answers (2)

jlhoward
jlhoward

Reputation: 59355

Calling the xml text at the beginning of your question xmlText,

library(XML)
xml <- xmlParse(xmlText,asText=T)
lapply(xml["//A//C"],function(node)sapply(xmlElementsByTagName(node,"D"),xmlValue))
# [[1]]
#    D    D 
# "10" "20" 
#
# [[2]]
#    D    D 
# "30" "50" 

If you want integers instead of character and you don't want the names,

get.D <- function(node) unname(sapply(xmlElementsByTagName(node,"D"),function(n)as.integer(xmlValue(n))))
lapply(xml["//A//C"],get.D)
# [[1]]
# [1] 10 20
#
# [[2]]
# [1] 30 50

Upvotes: 1

lawyeR
lawyeR

Reputation: 7654

I'm not sure of the intermediate steps you want, but to get the values of D,

node_a <- xpathSApply(doc = xmlobj, path = "//D", xmlValue, trim = TRUE)

> node_a
[1] "10" "20" "30" "50"

Upvotes: 1

Related Questions