Reputation: 4761
I'm attempting the parse the following XML file in R: http://reports.ieso.ca/public/GenOutputCapability/PUB_GenOutputCapability_20140517_v24.xml
My script is dead simple so far:
file <- "http://reports.ieso.ca/public/GenOutputCapability/PUB_GenOutputCapability_20140517_v24.xml"
doc <- xmlTreeParse(file, useInternal=TRUE)
rootNode <- xmlRoot(doc)
xpathSApply(rootNode, "//GeneratorName", xmlValue)
Whenever I run this, my output is simply an empty list.
Using this for other XML files, I can extract values no problem, but for this particular file, I can't extract anything. I've tried a number of different nodes, capitalizations, using useInternal=FALSE, and any other combination of things I could, but still no luck.
I can access parts using the rootNode[["IMODocBody"]][["Date"]] syntax to get the date, for example, so I know the file is loaded. Any ideas?
Upvotes: 2
Views: 275
Reputation: 30425
You need to use the appropriate namespace:
> head(xpathSApply(doc, "//ns:GeneratorName", xmlValue
, namespaces = c(ns = "http://www.theIMO.com/schema")))
[1] "BRUCEA-G1" "BRUCEA-G2" "BRUCEA-G3" "BRUCEA-G4" "BRUCEB-G5" "BRUCEB-G6"
see ?xmlNamespaceDefinitions
> xmlNamespaceDefinitions(doc)
[[1]]
$id
[1] ""
$uri
[1] "http://www.theIMO.com/schema"
$local
[1] TRUE
attr(,"class")
[1] "XMLNamespaceDefinition"
$xsi
$id
[1] "xsi"
$uri
[1] "http://www.w3.org/2001/XMLSchema-instance"
$local
[1] TRUE
attr(,"class")
[1] "XMLNamespaceDefinition"
attr(,"class")
[1] "XMLNamespaceDefinitions"
Upvotes: 6