Scraping html nodes from multiple pages

Question

I'm trying to scrape some data for a potential stats project, but I can't seem to get all of the nodes per page. Instead, it only grabs the first one before moving the next page.

library(rvest)

pages <- pages <- c("https://merrimackathletics.com/sports/" %>%
          paste0(c("baseball", "mens-basketball", "mens-cross-country") %>%
          paste0("/roster")))

Major <- lapply(pages,
         function(url){
           url %>% read_html(url) %>%
           html_node(".sidearm-roster-player-major") %>%
           html_text()
})

Subsequently, the above only returns:

> Major
[[1]]
[1] "Business Adminstration"

[[2]]
[1] "Communications"

[[3]]
[1] "Global Management"

How should I go about indexing the node such that I get more than just the first "major" per page? Thanks!

Biblot · Accepted Answer

The function html_node only extracts the first element. html_nodes will do what you want.

From the documentation:

html_node is like [[ it always extracts exactly one element. When given a list of nodes, html_node will always return a list of the same length, the length of html_nodes might be longer or shorter.

Scraping html nodes from multiple pages

Answers (1)

Related Questions