Kai Seward
Kai Seward

Reputation: 33

Parsing iTunes RSS in R

I'm trying to parse the iTunes top 100 in R and spit out artist, song etc. but I'm having issues with the XML file, I guess. I was able to easily get useable data with Billboard's RSS (http://www1.billboard.com/rss/charts/hot-100)

GetBillboard <- function() {

  hot.100 <- xmlTreeParse("http://www1.billboard.com/rss/charts/hot-100")
  hot.100 <- xpathApply(xmlRoot(hot.100), "//item")

  top.songs <- character(length(hot.100))

  for(i in 1:length(hot.100)) {
    top.songs[i] <- xmlSApply(hot.100[[i]], xmlValue)[3]
  }
  return(top.songs)

}

Trying similar strategies with iTunes, though (https://itunes.apple.com/us/rss/topmusicvideos/limit=100/explicit=true/xml)

GetITunes <- function() {
  itunes.raw <- getURL("https://itunes.apple.com/us/rss/topmusicvideos/limit=100/explicit=true/xml")
  itunes.xml <- xmlTreeParse(itunes.raw)
  top.vids <- xpathApply(xmlRoot(itunes.xml), "//entry")
  return(top.vids)
}

I just get nonsense:

> m <- GetITunes()
> m
list()
attr(,"class")
[1] "XMLNodeSet"
> 

I'm guessing it's the formatting of the XML file. How can I get these iTunes data to fall into a similar structure as the data from Billboard at this point in the first function?

hot.100 <- xpathApply(xmlRoot(hot.100), "//item")

Thanks!

Upvotes: 3

Views: 455

Answers (1)

MrFlick
MrFlick

Reputation: 206446

The problem is that your XML document has a default namespace and you are not taking that into consideration in your xpath. Unfortunately, when there is a default namespace, you need to be explicit about using that in your xpath. This should work

xpathApply(xmlRoot(itunes.xml), "//d:entry", 
    namespaces=c(d="http://www.w3.org/2005/Atom"))

Here we arbitrarily choose d to point to the default namespace used in the XML document and then use that prefix in our xpath expression.

Upvotes: 2

Related Questions