Reputation: 124
I'm trying to write some code to return the values of a given element in an xml feed. The following code works for all of the feeds except uk_legislation_feed. Can someone give me a hint as to why this might be and how to fix the problem? Thanks.
library(XML)
uk_legislation_feed <- c("http://www.legislation.gov.uk/new/data.feed", "xml", "//title")
test_feed <- c("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml", "xml", "//zipcode")
ons_feed <- c("https://www.ons.gov.uk/releasecalendar?rss", "xml", "//title")
read_data <- function(feed) {
if (feed[2] == "xml") {
if (!file.exists(feed[1])) download.file(feed[1], "tmp.xml", "curl")
dat <- xmlRoot(xmlTreeParse("tmp.xml", useInternalNodes = TRUE))
}
titles <- xpathSApply(dat, feed[3], xmlValue)
return(titles)
}
Upvotes: 0
Views: 317
Reputation: 107587
Due to the undeclared namespace in uk_legislation_feed
(specifically, no xmlns prefix) http://www.w3.org/2005/Atom
, nodes are not properly mapped. Hence, you will need to declare a namespace at the URI and use it in XPath expression:
url <- "http://www.legislation.gov.uk/new/data.feed"
webpage <- readLines(url)
file <- xmlParse(webpage)
nmsp <- c(ns="http://www.w3.org/2005/Atom")
titles <- xpathSApply(file, "//ns:title", xmlValue,
namespaces = nmsp)
titles
# [1] "Search Results"
# [2] "The Air Navigation (Restriction of Flying) (RNAS Culdrose) (Amendment) \
# Regulations 2016"
...
Upvotes: 3