Simple solution to scrape and loop with rvest, storing the results of the for-loop in a variable

Question

I need to collect the links from 3 pages, each having 150 links, using R with rvest library. I used a for-loop to crawl through the pages. I know that it's a very basic question, which has been answered elsewhere: R web scraping across multiple pages Scrape and Loop with Rvest I tried different versions of the following code. Most of them worked but returned only 50 instead of 150 links

library(rvest)

baseurl <- "https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=chain+and+sprocket&_sacat=0&_pgn="
n <- 1:3
nextpages <- paste0(baseurl, n)

for(i in nextpages){
  html <- read_html(nextpages)
  links <- html %>% html_nodes("a.vip") %>% html_attr("href")
}

The code is expected to return all the 150, instead of just 50.

user2474226 · Accepted Answer

You're overwriting the links variable in every iteration, so you would only end up with the last 50 links.

But you're looping using the 'i' variable, whereas your read_html() function uses the nextpages variable, which is actually a vector of 3 urls. You should be getting an error.

Try this:

links <- c()
for(i in nextpages){
  html <- read_html(i)
  links <- c(links, html %>% html_nodes("a.vip") %>% html_attr("href"))
}

Simple solution to scrape and loop with rvest, storing the results of the for-loop in a variable

Answers (2)

Related Questions