Reputation: 145
I have the following XMl file that I want to parse using R with the following code:
Fun2 <-function(xdata){
dumFun <- function(x){
xname <- xmlName(x)
xattrs <- xmlAttrs(x)
c(sapply(xmlChildren(x), xmlValue), name = xname, xattrs)
}
dum <- xmlParse(xdata)
as.data.frame(t(xpathSApply(dum, "//*/name", dumFun)), stringsAsFactors = FALSE)
}
What I want to add is the another column with ID which are 52 and 53 in the XML. The issue is that there are 2 values of ID, but 6 for the tag "name", any help is appreciated.
<?xml version='1.0' encoding='UTF-8'?>
<gwl>
<version>20161109152411</version>
<entities>
<entity id="52" version="1234">
<names>
<name type="primary">Carl A.</name>
<name type="alt">David A.</name>
<name type="alt">Daniel A.</name>
</names>
</entity>
<entity id="53" version="12346">
<names>
<name type="primary">Carl B.</name>
<name type="alt">David B.</name>
<name type="alt">Daniel B.</name>
</names>
</entity>
</entities>
</gwl>
The desired output is like below:
-----------------------------------
|Column1 | Column2 | Column3|
-----------------------------------
|52 | Carl A. | primary|
-----------------------------------
|52 | David A. | alt |
-----------------------------------
|52 | Daniel A.| alt |
-----------------------------------
|53 | Carl B. | primary|
-----------------------------------
|53 | David B. | alt |
-----------------------------------
|53 | Daniel B.| alt |
-----------------------------------
Upvotes: 1
Views: 276
Reputation: 12703
EDIT: Based on the edited desired output
Get the ID values and loop through the nodeset for each ID and get the xmlvalue and attributes of name node. Finally combine all together using rbind
and convert it to data frame.
df1 <- do.call( 'rbind', lapply( xmlSApply(doc["//entity"], function(x) xmlGetAttr(x, "id")),
function(x) {
t( xmlSApply( doc[ paste("//entity[@id=", x, "]//name", sep = "") ],
function( y ) c(x, xmlValue(y), xmlAttrs(y)) ))
}))
colnames( df1 ) <- c( 'Column1', 'Column2', 'Column3' )
df1 <- data.frame( df1, stringsAsFactors = FALSE )
df1
# Column1 Column2 Column3
# 1 52 Carl A. primary
# 2 52 David A. alt
# 3 52 Daniel A. alt
# 4 53 Carl B. primary
# 5 53 David B. alt
# 6 53 Daniel B. alt
Data:
library(XML)
doc <- xmlParse('<gwl>
<version>20161109152411</version>
<entities>
<entity id="52" version="1234">
<names>
<name type="primary">Carl A.</name>
<name type="alt">David A.</name>
<name type="alt">Daniel A.</name>
</names>
</entity>
<entity id="53" version="12346">
<names>
<name type="primary">Carl B.</name>
<name type="alt">David B.</name>
<name type="alt">Daniel B.</name>
</names>
</entity>
</entities>
</gwl>')
Upvotes: 1