Reputation: 1
Using R and the XML package (xmlTreeParse etc) I tried my best to read specific nodes from xml files without success. The following xml dummy example represents the data I am using:
<item>
<title> Mickey Mouse </title>
<description> Cartoon </description>
<pubDate> 25 Apr 1965 </pubDate>
<disney:Filing web="http://www.waltdisney.com/archives">
<disney:fileNumber>125364</disney:fileNumber>
<disney:assignedID>7389</disney:assignedID>
<disney:Files>
<disney:File disney:set="1" disney:file="abc.mov" disney:type="B&W"/>
<disney:File disney:set="2" disney:file="def.mov" disney:type="Col"/>
<disney:File disney:set="3" disney:file="wzt.mov" disney:type="B&W"/>
</disney:Files>
</disney:Filing>
</item>
I applied xpathApply to successfully extract the first three nodes. But I am not able to get to the nodes tagged with "disney:File". For some reason anything beyond disney:Files is unreadable ("invisible").
My goal is to either extract all the disney:File lines into a data frame or more nifty: first search for a specific disney:set and extract all information from this node alone into a data frame. Any help would be really great. Thanks in advance!
Upvotes: 0
Views: 5678
Reputation: 30425
Some sample data
'<?xml version="1.0"?>
<aw:PurchaseOrder
aw:PurchaseOrderNumber="99503"
aw:OrderDate="1999-10-20"
xmlns:aw="http://www.adventure-works.com">
<aw:Address aw:Type="Shipping">
<aw:Name>Ellen Adams</aw:Name>
<aw:Street>123 Maple Street</aw:Street>
<aw:City>Mill Valley</aw:City>
<aw:State>CA</aw:State>
<aw:Zip>10999</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:Address aw:Type="Billing">
<aw:Name>Tai Yee</aw:Name>
<aw:Street>8 Oak Avenue</aw:Street>
<aw:City>Old Town</aw:City>
<aw:State>PA</aw:State>
<aw:Zip>95819</aw:Zip>
<aw:Country>USA</aw:Country>
</aw:Address>
<aw:DeliveryNotes>Please leave packages in shed by driveway.</aw:DeliveryNotes>
<aw:Items>
<aw:Item aw:PartNumber="872-AA">
<aw:ProductName>Lawnmower</aw:ProductName>
<aw:Quantity>1</aw:Quantity>
<aw:USPrice>148.95</aw:USPrice>
<aw:Comment>Confirm this is electric</aw:Comment>
</aw:Item>
<aw:Item aw:PartNumber="926-AA">
<aw:ProductName>Baby Monitor</aw:ProductName>
<aw:Quantity>2</aw:Quantity>
<aw:USPrice>39.98</aw:USPrice>
<aw:ShipDate>1999-05-21</aw:ShipDate>
</aw:Item>
</aw:Items>
</aw:PurchaseOrder>' -> xData
You can declare the namespcae and give it a tag here we use ns
. In this case we could have just used aw:Item
but we tag the namespace as an example:
library(XML)
myData <- xmlParse(xData)
> xpathSApply(myData, "//*/ns:Item/ns:ProductName"
, namespaces = c(ns = "http://www.adventure-works.com")
, xmlValue)
[1] "Lawnmower" "Baby Monitor"
Upvotes: 2