Dan Luba
Dan Luba

Reputation: 124

xpathSApply not finding required node

I'm trying to write some code to return the values of a given element in an xml feed. The following code works for all of the feeds except uk_legislation_feed. Can someone give me a hint as to why this might be and how to fix the problem? Thanks.

library(XML)

uk_legislation_feed <- c("http://www.legislation.gov.uk/new/data.feed", "xml", "//title")
test_feed <- c("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml", "xml", "//zipcode")
ons_feed <- c("https://www.ons.gov.uk/releasecalendar?rss", "xml", "//title")

read_data <- function(feed) {
  if (feed[2] == "xml") {
    if (!file.exists(feed[1])) download.file(feed[1], "tmp.xml", "curl")
    dat <- xmlRoot(xmlTreeParse("tmp.xml", useInternalNodes = TRUE))
  }
  titles <- xpathSApply(dat, feed[3], xmlValue)

  return(titles)
}

Upvotes: 0

Views: 317

Answers (1)

Parfait
Parfait

Reputation: 107587

Due to the undeclared namespace in uk_legislation_feed (specifically, no xmlns prefix) http://www.w3.org/2005/Atom, nodes are not properly mapped. Hence, you will need to declare a namespace at the URI and use it in XPath expression:

url <- "http://www.legislation.gov.uk/new/data.feed"
webpage <- readLines(url)

file <- xmlParse(webpage)
nmsp <- c(ns="http://www.w3.org/2005/Atom")

titles <- xpathSApply(file, "//ns:title", xmlValue,
                      namespaces = nmsp)
titles

# [1] "Search Results"  

# [2] "The Air Navigation (Restriction of Flying) (RNAS Culdrose) (Amendment) \
#      Regulations 2016"
...

Upvotes: 3

Related Questions