Walker Harrison
Walker Harrison

Reputation: 537

Extracting hyperlink from HTML table with rvest

I've seen similar questions here and implemented the solutions but still can't seem to figure this one out. Still an R novice, so bear with me: I've managed to to get a table of Barack Obama's speeches from this website using rvest:

library(rvest)
page <- read_html("http://www.americanrhetoric.com/barackobamaspeeches.htm")
speeches <- page %>%
  html_nodes(xpath = '//*[@id="AutoNumber1"]') %>% 
  html_table(fill=TRUE)
speeches <- speeches[[1]][,2:4]
head(speeches)

which yields:

            X2                                            X3    X4
1             <NA>                                          <NA>  <NA>
2    Delivery Date                  Speech Title/Text/MultiMedia Audio
3     27 July 2004 Democratic National Convention Keynote Speech   mp3
4  06 January 2005 Senate Speech on Ohio Electoral Vote Counting   mp3
5     04 June 2005              Knox College Commencement Speech   mp3
6 15 December 2005              Senate Speech on the PATRIOT Act   mp3

However, I'd like to also extract the hyperlink for each entry in the "Speech" column, which naturally lives in the href attribute. I've researched this pretty thoroughly online, and some people say to also specify the html attribute with html_attr('href'), but if I include that in the above code I get this error:

Error in UseMethod("xml_attr") : no applicable method for 'xml_attr' applied to an object of class "list"

Another person suggesting tinkering with the actual function with trace but that seems overly involved for something that seems kind of straightforward. Any idea where I'm tripping up?

Upvotes: 1

Views: 3311

Answers (1)

Constantinos
Constantinos

Reputation: 1327

Using Selector Gadget to determine the node, I extracted the URLs with:

page %>% html_nodes("td:nth-child(2) a") %>% html_attr("href")

Upvotes: 5

Related Questions