Reputation: 305
I am working a project on R. I want to find the link aosmith.com as it's exposed on Wikipedia page https://en.wikipedia.org/wiki/A._O._Smith. May my question has been again asked but I haven't managed to find a solution yet. What I did so far is the following but without success:
library(rvest)
library(magrittr)
url <- "https://en.wikipedia.org/wiki/A._O._Smith"
links <- read_html(url) %>% html_nodes(., ".lister-item-header a") %>% html_attr(., "href")
Upvotes: 1
Views: 149
Reputation: 736
This should work for any Wikipedia link set to url and will return only the desired URL:
library(rvest)
library(magrittr)
url <- "https://en.wikipedia.org/wiki/A._O._Smith"
link<-read_html(url) %>% html_nodes(".infobox") %>% html_nodes(".url>a")%>% html_attr(name='href')
Upvotes: 2
Reputation: 2071
Using he inspector tool of the browser (F12 and Ctrl+Shift+C), you could copy the xpath
of the link (click aosmith.com
, then in the panel right click on the blue box). In R, use the copied xpath
to access the desired element.
link <- read_html(url) %>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table/tbody/tr[19]/td/span/a') %>%
html_attr(., "href")
Upvotes: 1
Reputation: 173803
You get more control and generalisability by using a specific xpath expression. This xpath expression just searches for the link with the text "A.O. Smith". Compared to using numbered xpaths generated by the browser, this is less likely to break if/when the page is updated.
library(rvest)
library(magrittr)
url <- "https://en.wikipedia.org/wiki/A._O._Smith"
link <- read_html(url) %>%
html_nodes(xpath = "//a[text() = 'A.O. Smith']") %>%
html_attr("href")
link
#> [1] "http://www.aosmith.com"
Upvotes: 2