C. Tanaka
C. Tanaka

Reputation: 145

Parsing XML with different number of subnode with same name in R

I have the following XMl file that I want to parse using R with the following code:

Fun2 <-function(xdata){
    dumFun <- function(x){
        xname <- xmlName(x)
        xattrs <- xmlAttrs(x)
        c(sapply(xmlChildren(x), xmlValue), name = xname, xattrs)
    }
    dum <- xmlParse(xdata)
    as.data.frame(t(xpathSApply(dum, "//*/name", dumFun)), stringsAsFactors = FALSE)
}

What I want to add is the another column with ID which are 52 and 53 in the XML. The issue is that there are 2 values of ID, but 6 for the tag "name", any help is appreciated.

<?xml version='1.0' encoding='UTF-8'?>
<gwl>
  <version>20161109152411</version>
  <entities>
    <entity id="52" version="1234">
      <names>
        <name type="primary">Carl A.</name>
        <name type="alt">David A.</name>
        <name type="alt">Daniel A.</name>
      </names>
    </entity>

    <entity id="53" version="12346">
      <names>
        <name type="primary">Carl B.</name>
        <name type="alt">David B.</name>
        <name type="alt">Daniel B.</name>
      </names>
    </entity>
  </entities>
</gwl>

The desired output is like below:

-----------------------------------
|Column1      | Column2  | Column3|
-----------------------------------
|52           | Carl A.  | primary|
-----------------------------------
|52           | David A. | alt    |
-----------------------------------
|52           | Daniel A.| alt    |
-----------------------------------
|53           | Carl B.  | primary|
-----------------------------------
|53           | David B. | alt    |
-----------------------------------
|53           | Daniel B.| alt    |
-----------------------------------

Upvotes: 1

Views: 276

Answers (1)

Sathish
Sathish

Reputation: 12703

EDIT: Based on the edited desired output

Get the ID values and loop through the nodeset for each ID and get the xmlvalue and attributes of name node. Finally combine all together using rbind and convert it to data frame.

df1 <- do.call( 'rbind', lapply( xmlSApply(doc["//entity"], function(x) xmlGetAttr(x, "id")), 
                                 function(x) {
                                   t( xmlSApply( doc[ paste("//entity[@id=", x, "]//name", sep = "") ], 
                                                 function( y ) c(x, xmlValue(y), xmlAttrs(y)) ))
                                 }))

colnames( df1 ) <- c( 'Column1', 'Column2', 'Column3' )
df1 <- data.frame( df1, stringsAsFactors = FALSE )
df1
#   Column1   Column2 Column3
# 1      52   Carl A. primary
# 2      52  David A.     alt
# 3      52 Daniel A.     alt
# 4      53   Carl B. primary
# 5      53  David B.     alt
# 6      53 Daniel B.     alt 

Data:

library(XML)
doc <- xmlParse('<gwl>
                    <version>20161109152411</version>
                    <entities>
                    <entity id="52" version="1234">
                    <names>
                    <name type="primary">Carl A.</name>
                    <name type="alt">David A.</name>
                    <name type="alt">Daniel A.</name>
                    </names>
                    </entity>
                    <entity id="53" version="12346">
                    <names>
                    <name type="primary">Carl B.</name>
                    <name type="alt">David B.</name>
                    <name type="alt">Daniel B.</name>
                    </names>
                    </entity>
                    </entities>
                    </gwl>')    

Upvotes: 1

Related Questions