PBolbrinker
PBolbrinker

Reputation: 1

R: Extracting specific node content from XML data

Using R and the XML package (xmlTreeParse etc) I tried my best to read specific nodes from xml files without success. The following xml dummy example represents the data I am using:

<item> 
<title> Mickey Mouse </title>
<description> Cartoon </description>
<pubDate> 25 Apr 1965 </pubDate>
 <disney:Filing web="http://www.waltdisney.com/archives">
 <disney:fileNumber>125364</disney:fileNumber>
 <disney:assignedID>7389</disney:assignedID>
 <disney:Files>
  <disney:File disney:set="1" disney:file="abc.mov" disney:type="B&W"/>
  <disney:File disney:set="2" disney:file="def.mov" disney:type="Col"/>
  <disney:File disney:set="3" disney:file="wzt.mov" disney:type="B&W"/>
 </disney:Files>
</disney:Filing>
</item> 

I applied xpathApply to successfully extract the first three nodes. But I am not able to get to the nodes tagged with "disney:File". For some reason anything beyond disney:Files is unreadable ("invisible").

My goal is to either extract all the disney:File lines into a data frame or more nifty: first search for a specific disney:set and extract all information from this node alone into a data frame. Any help would be really great. Thanks in advance!

Upvotes: 0

Views: 5678

Answers (1)

jdharrison
jdharrison

Reputation: 30425

Some sample data

'<?xml version="1.0"?>
<aw:PurchaseOrder
    aw:PurchaseOrderNumber="99503"
aw:OrderDate="1999-10-20"
xmlns:aw="http://www.adventure-works.com">
<aw:Address aw:Type="Shipping">
<aw:Name>Ellen Adams</aw:Name>
<aw:Street>123 Maple Street</aw:Street>
<aw:City>Mill Valley</aw:City>
<aw:State>CA</aw:State>
<aw:Zip>10999</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:Address aw:Type="Billing">
<aw:Name>Tai Yee</aw:Name>
<aw:Street>8 Oak Avenue</aw:Street>
<aw:City>Old Town</aw:City>
<aw:State>PA</aw:State>
<aw:Zip>95819</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:DeliveryNotes>Please leave packages in shed by driveway.</aw:DeliveryNotes>
<aw:Items>
<aw:Item aw:PartNumber="872-AA">
<aw:ProductName>Lawnmower</aw:ProductName>
<aw:Quantity>1</aw:Quantity>
<aw:USPrice>148.95</aw:USPrice>
<aw:Comment>Confirm this is electric</aw:Comment>
</aw:Item>
<aw:Item aw:PartNumber="926-AA">
<aw:ProductName>Baby Monitor</aw:ProductName>
<aw:Quantity>2</aw:Quantity>
<aw:USPrice>39.98</aw:USPrice>
<aw:ShipDate>1999-05-21</aw:ShipDate>
</aw:Item>
</aw:Items>
</aw:PurchaseOrder>' -> xData

You can declare the namespcae and give it a tag here we use ns. In this case we could have just used aw:Item but we tag the namespace as an example:

library(XML)
myData <- xmlParse(xData)
> xpathSApply(myData, "//*/ns:Item/ns:ProductName"
              , namespaces = c(ns = "http://www.adventure-works.com")
              , xmlValue)
[1] "Lawnmower"    "Baby Monitor"

Upvotes: 2

Related Questions