Reputation: 537
I've seen similar questions here and implemented the solutions but still can't seem to figure this one out. Still an R novice, so bear with me: I've managed to to get a table of Barack Obama's speeches from this website using rvest:
library(rvest)
page <- read_html("http://www.americanrhetoric.com/barackobamaspeeches.htm")
speeches <- page %>%
html_nodes(xpath = '//*[@id="AutoNumber1"]') %>%
html_table(fill=TRUE)
speeches <- speeches[[1]][,2:4]
head(speeches)
which yields:
X2 X3 X4
1 <NA> <NA> <NA>
2 Delivery Date Speech Title/Text/MultiMedia Audio
3 27 July 2004 Democratic National Convention Keynote Speech mp3
4 06 January 2005 Senate Speech on Ohio Electoral Vote Counting mp3
5 04 June 2005 Knox College Commencement Speech mp3
6 15 December 2005 Senate Speech on the PATRIOT Act mp3
However, I'd like to also extract the hyperlink for each entry in the "Speech" column, which naturally lives in the href
attribute. I've researched this pretty thoroughly online, and some people say to also specify the html attribute with html_attr('href')
, but if I include that in the above code I get this error:
Error in UseMethod("xml_attr") : no applicable method for 'xml_attr' applied to an object of class "list"
Another person suggesting tinkering with the actual function with trace
but that seems overly involved for something that seems kind of straightforward. Any idea where I'm tripping up?
Upvotes: 1
Views: 3311
Reputation: 1327
Using Selector Gadget to determine the node, I extracted the URLs with:
page %>% html_nodes("td:nth-child(2) a") %>% html_attr("href")
Upvotes: 5