jburkhardt
jburkhardt

Reputation: 675

XML to Dataframe

Does anyone know how to convert following XML into R dataframe?

    <?xml version="1.0"?>
    <soap:Envelope>
      <soap:Body>
       <getCampaignsResponse>
                <getCampaignsResult>
                    <campaign>
                        <categoryBids>
                                <categoryBid>
                                    <campaignCategoryUID>1234</campaignCategoryUID>
                                    <campaignID>1211</campaignID>
                                    <categoryID>1254</categoryID>
                                    <selected>true</selected>
                                    <bidInformation>
                                      <biddingStrategy>Cpc</biddingStrategy>
                                      <cpcBid>
                                        <cpc>0.5</cpc>
                                      </cpcBid>
                                      <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                                <categoryBid>
                                      <campaignCategoryUID>5487</campaignCategoryUID>
                                      <campaignID>3244</campaignID>
                                      <categoryID>1234</categoryID>
                                      <selected>true</selected>
                                      <bidInformation>
                                        <biddingStrategy>Cpc</biddingStrategy>
                                        <cpcBid>
                                          <cpc>0.2</cpc>
                                        </cpcBid>
                                        <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                      </categoryBids>
                  </campaign>
              </getCampaignsResult>
          </getCampaignsResponse>
      </soap:Body>
  </soap:Envelope>

The class of the XML Object is:

> str(data)  
Classes 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr> 

The dataframe should have following columns:
campaignCategoryUID
campaignID
categoryID
biddingStrategy
cpc

With xmlToDataFrame or xmlToList I couldn´t achieve useful results. Any help is really appreciated!

Upvotes: 0

Views: 618

Answers (1)

hrbrmstr
hrbrmstr

Reputation: 78852

You have to extract the nodes by hand with something like xpathSApply and probably need to change the way you parse the response since it doesn't have any namespace definitions:

library(XML)

xml <- '<?xml version="1.0"?>
    <soap:Envelope>
      <soap:Body>
       <getCampaignsResponse>
                <getCampaignsResult>
                    <campaign>
                        <categoryBids>
                                <categoryBid>
                                    <campaignCategoryUID>1234</campaignCategoryUID>
                                    <campaignID>1211</campaignID>
                                    <categoryID>1254</categoryID>
                                    <selected>true</selected>
                                    <bidInformation>
                                      <biddingStrategy>Cpc</biddingStrategy>
                                      <cpcBid>
                                        <cpc>0.5</cpc>
                                      </cpcBid>
                                      <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                                <categoryBid>
                                      <campaignCategoryUID>5487</campaignCategoryUID>
                                      <campaignID>3244</campaignID>
                                      <categoryID>1234</categoryID>
                                      <selected>true</selected>
                                      <bidInformation>
                                        <biddingStrategy>Cpc</biddingStrategy>
                                        <cpcBid>
                                          <cpc>0.2</cpc>
                                        </cpcBid>
                                        <cpaBid xsi:nil="true"/>
                                    </bidInformation>
                                </categoryBid>
                      </categoryBids>
                  </campaign>
              </getCampaignsResult>
          </getCampaignsResponse>
      </soap:Body>
  </soap:Envelope>'

doc <- xmlRoot(xmlTreeParse(xml, useInternalNodes = TRUE))

data <- data.frame(campaignCategoryUID=xpathSApply(doc, "//campaignCategoryUID", xmlValue),
                   campaignID=xpathSApply(doc, "//campaignID", xmlValue),
                   categoryID=xpathSApply(doc, "//categoryID", xmlValue),
                   biddingStrategy=xpathSApply(doc, "//biddingStrategy", xmlValue),
                   cpc=xpathSApply(doc, "//cpc", xmlValue))

data

##   campaignCategoryUID campaignID categoryID biddingStrategy cpc
## 1                1234       1211       1254             Cpc 0.5
## 2                5487       3244       1234             Cpc 0.2

You can also do the extraction functionally:

nodes <- c("campaignCategoryUID", "campaignID", "categoryID", "biddingStrategy", "cpc")
data <- rbind.data.frame(sapply(nodes, function(x) xpathSApply(doc, sprintf("//%s", x), xmlValue)))

provided you don't need to deal with edge cases (i.e. provided all the extractions are uniform and won't have "errors").

Upvotes: 1

Related Questions