Reputation: 65
I'd like to add links to individual petitions to a dataframe using scrape_change_page function below but I'm not sure how to adjust to include html_attr function that will scrape the url links from the multiple pages. Any thoughts very welcome!
pacman::p_load(rvest, dplyr, stringr, purrr, lubridate, tibble, tidyr, stringi, stringr)
url <- ''
scrape_change_page <- function(url)
webpage <- xml2::read_html(url)
get_text <- function(css)
vec <- rvest::html_text(rvest::html_nodes(webpage, css), trim = TRUE)
if(length(vec) < 10) c(vec, rep("", 10 - length(vec))) else vec
title = get_text('.xs-mbs'),
date = gsub("Created", "", get_text('.symbol-clock+ span')),
supporters = gsub(" supporters", "", get_text('.symbol-supporters+ span')),
addressee = gsub("Petition to ", "", get_text('.xs-mbn .type-s')),
location = get_text('.plxxs'),
#select number of pages(test on 3 pages)
n_pages <- 3
urls <- paste0(url, "&offset=", 10 * (seq(n_pages)) - 1)
result <-, lapply(urls, scrape_change_page))
Here's the basic html_attr that scrapes the links:
page <- read_html("")
page %>%
html_nodes(".search-results .list-rule") %>%
html_nodes("a") %>%
Upvotes: 1
Views: 309
Reputation: 65
Figured it out by updating the function. Updated code below.
scrape_change_page <- function(url)
webpage <- xml2::read_html(url)
get_text <- function(css)
vec <- rvest::html_text(rvest::html_nodes(webpage, css), trim = TRUE)
if(length(vec) < 10) c(vec, rep("", 10 - length(vec))) else vec
get_attr <- function(css, attr)
vec <- rvest::html_attr(rvest::html_nodes(webpage, css), attr)
if(length(vec) < 10) c(vec, rep("", 10 - length(vec))) else vec
title = get_text('.xs-mbs'),
date = gsub("Created", "", get_text('.symbol-clock+ span')),
supporters = gsub(" supporters", "", get_text('.symbol-supporters+ span')),
addressee = gsub("Petition to ", "", get_text('.xs-mbn .type-s')),
location = get_text('.plxxs'),
link = get_attr('.search-results .list-rule', 'href')
Upvotes: 1