Derek Corcoran
Derek Corcoran

Reputation: 4092

Trying to extract the links of r packages using rvest

I have been trying to use this question and this tutorial to get the table and links for the list of available rpackages in cran

Getting the html table

I got that right doing this:

library(rvest)

page <- read_html("http://cran.r-project.org/web/packages/available_packages_by_name.html") %>% html_node("table") %>% html_table(fill = TRUE, header = FALSE)

trying to get the links

When I try to get the links is where I get in trouble, I tried using the selector gadget for the first column of the table (Packages links) and I got the node td a, so I tried this:

test2 <- read_html("http://cran.r-project.org/web/packages/available_packages_by_name.html") %>% html_node("td a") %>%  html_attr("href") 

But I only get the first link, then I thought I could get all the href from the tables and tried the following:

test3 <- read_html("http://cran.r-project.org/web/packages/available_packages_by_name.html") %>% html_node("table") %>%  html_attr("href") 

but got nothing, what am I doing wrong?

Upvotes: 0

Views: 112

Answers (1)

Aur&#232;le
Aur&#232;le

Reputation: 12819

Essentially, an "s" is missing: html_nodes() is used instead of html_node:

x <- 
  read_html(paste0(
    "http://cran.r-project.org/web/",
    "packages/available_packages_by_name.html")) 

html_nodes(x, "td a") %>% 
  sapply(html_attr, "href")

Upvotes: 1

Related Questions