djMohit
djMohit

Reputation: 171

Scraping multiple sub-pages of multiple URLs

I have a list of base URLs as follows :

PostURL
www.abc.com/2315Azxc
www.abc.com/1478Bnbx
www.abc.com/6734Gytr
www.abc.com/8912Jqwe

Each URL has sub pages like

www.abc.com/2315Azxc&page=1
www.abc.com/2315Azxc&page=2
www.abc.com/2315Azxc&page=3

I know scraping of data from multiple sub pages of one base URL using rvest as follows:

df<- lapply(paste0(' www.abc.com/2315Azxc&page=', 1:3),
                    function(url){
                      url %>% read_html() %>% 
                        html_nodes(".xg_border") %>% 
                        html_text()
                    })

But It will require much attention/time to scrape one by one. I am looking for the solution which can scrape data from multiple sub pages of base URLs.

Upvotes: 1

Views: 246

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388907

You could construct link to all the URL's using outer :

all_links <- c(t(outer(df$PostURL, paste0('&page=', 1:3), paste0)))
all_links

# [1] "www.abc.com/2315Azxc&page=1" "www.abc.com/2315Azxc&page=2" "www.abc.com/2315Azxc&page=3"
# [4] "www.abc.com/1478Bnbx&page=1" "www.abc.com/1478Bnbx&page=2" "www.abc.com/1478Bnbx&page=3"
# [7] "www.abc.com/6734Gytr&page=1" "www.abc.com/6734Gytr&page=2" "www.abc.com/6734Gytr&page=3"
#[10] "www.abc.com/8912Jqwe&page=1" "www.abc.com/8912Jqwe&page=2" "www.abc.com/8912Jqwe&page=3"

Now you can use the same lapply code to scrape each page.

data

df <- structure(list(PostURL = c("www.abc.com/2315Azxc", "www.abc.com/1478Bnbx", 
"www.abc.com/6734Gytr", "www.abc.com/8912Jqwe")), 
class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

Related Questions