Reputation: 115
Given the following XML file:
<XML>
<A>
<B>
<ID>1</ID>
</B>
<C>
<D>10</D>
<D>20</D>
</C>
</A>
<A>
<B>
<ID>2</ID>
</B>
<C>
<D>30</D>
<D>50</D>
</C>
</A>
</XML>
With the following R code I can read in the XML file:
library(XML)
xmlobj <- xmlTreeParse("my_file.xml", useInternalNodes = TRUE)
First, I would like to get a list of the XML nodes "A". I can do this with
node_a <- xpathSApply(doc = xmlobj, path = "//A", xmlChildren)
and the result (node_a) looks like this:
[,1] [,2]
B ? ?
C ? ?
In a second step I would like to call a function on each of the XML Nodes in the list extracted in step1 returning a list of XML Nodes "D". I tried to get the children of "C" for the first "A" element in the list from step one:
xmlChildren(asXMLNode(node_a["C",1]))
But the result is:
named list()
attr(,"class")
[1] "XMLNodeList"
Finally, I would like to have the values of D separately for each A (one list of D values for A with ID 1 and one list of D values for A with ID 2).
Or in other words, I want to get a list with the values of all D elements which are part of element A with ID 1 and another list with the values of all D elements which are part of element A with ID 2.
Upvotes: 1
Views: 1036
Reputation: 59355
Calling the xml text at the beginning of your question xmlText
,
library(XML)
xml <- xmlParse(xmlText,asText=T)
lapply(xml["//A//C"],function(node)sapply(xmlElementsByTagName(node,"D"),xmlValue))
# [[1]]
# D D
# "10" "20"
#
# [[2]]
# D D
# "30" "50"
If you want integers instead of character and you don't want the names,
get.D <- function(node) unname(sapply(xmlElementsByTagName(node,"D"),function(n)as.integer(xmlValue(n))))
lapply(xml["//A//C"],get.D)
# [[1]]
# [1] 10 20
#
# [[2]]
# [1] 30 50
Upvotes: 1
Reputation: 7654
I'm not sure of the intermediate steps you want, but to get the values of D,
node_a <- xpathSApply(doc = xmlobj, path = "//D", xmlValue, trim = TRUE)
> node_a
[1] "10" "20" "30" "50"
Upvotes: 1