Zach
Zach

Reputation: 30301

Return a list of links from a webpage in R

I'm trying to write a function in r that, given an address, will return a list of links on that webpage.

For example:

getLinks("http://prog21.dadgum.com/109.html")

Would return:

"http://prog21.dadgum.com/prog21.css"
"http://prog21.dadgum.com/atom.xml"
"http://prog21.dadgum.com/index.html"
"http://prog21.dadgum.com/archives.html"
"http://prog21.dadgum.com/atom.xml"
"http://prog21.dadgum.com/56.html"
"http://prog21.dadgum.com/39.html"
"http://prog21.dadgum.com/109.html"
"http://prog21.dadgum.com/108.html"
"http://prog21.dadgum.com/107.html"
"http://prog21.dadgum.com/106.html"
"http://prog21.dadgum.com/105.html"
"http://prog21.dadgum.com/104.html"

Upvotes: 2

Views: 309

Answers (1)

Zach
Zach

Reputation: 30301

This function seems to work on other webpages, but for some reason does not return the complete URLs for the page in question. I'm interested to see if there's a better way to do this.

getLinks <- function(URL) {
    require(XML)
    doc <- htmlParse(URL)
    out <- unlist(doc['//@href'])
    names(out) <- NULL
    out
}

Upvotes: 3

Related Questions