How to scrape specific information from website with several pages in R

Question

I have just started with web scraping in R and I have trouble finding out how to scrape specific information from a website with several pages without having to do run the code for each individual url. So far I have managed to do it for the first page using this example: https://towardsdatascience.com/tidy-web-scraping-in-r-tutorial-and-resources-ac9f72b4fe47.

I have also managed to generate the urls based on pagenumber with this code:


list_of_pages <- str_c(url, '?page=', 1:32)

The problem is to integrate this and use the generated urls to get the information I need using one function and store it in a dataframe. This is the code I have for scraping the information:

hot100page <- "https://www.billboard.com/charts/hot-100"
hot100 <- read_html(hot100page)

rank <- hot100 %>% 
  rvest::html_nodes('body') %>% 
  xml2::xml_find_all("//span[contains(@class, 'chart-element__rank__number')]") %>% 
  rvest::html_text()

This is an example of the sturture of the website i plan to use the function for: https://www.amazon.com/s?k=statistics&ref=nb_sb_noss_2.

stevec · Accepted Answer

Here's a way to do it using rvest. Keep in mind, the particular website (hot100) doesn't actually use pagination, so the ?page=1 etc part of the url is meaningless (it just keeps loading the homepage). But for sites with pagination, this would work

library(tidyverse)
library(rvest)
hot100page <- "https://www.billboard.com/charts/hot-100"
hot100 <- read_html(hot100page)


df <- data.frame(rank=character(), somethingelse=character())

rank <- c()

for(i in 1:32) {

  print(paste0("Scraping page ", i))
  
  temp <- paste0(hot100page,  '?page=', i) %>% 
    read_html %>% 
    rvest::html_nodes('body') %>% 
    xml2::xml_find_all("//span[contains(@class, 'chart-element__rank__number')]") %>% 
    rvest::html_text()
  
  
  rank <- c(rank, temp)
}


df$rank <- rank
df

How to scrape specific information from website with several pages in R

Answers (2)

Related Questions