Reputation: 3
I'm trying to scrape some data for a potential stats project, but I can't seem to get all of the nodes per page. Instead, it only grabs the first one before moving the next page.
library(rvest)
pages <- pages <- c("https://merrimackathletics.com/sports/" %>%
paste0(c("baseball", "mens-basketball", "mens-cross-country") %>%
paste0("/roster")))
Major <- lapply(pages,
function(url){
url %>% read_html(url) %>%
html_node(".sidearm-roster-player-major") %>%
html_text()
})
Subsequently, the above only returns:
> Major
[[1]]
[1] "Business Adminstration"
[[2]]
[1] "Communications"
[[3]]
[1] "Global Management"
How should I go about indexing the node such that I get more than just the first "major" per page? Thanks!
Upvotes: 0
Views: 102
Reputation: 705
The function html_node
only extracts the first element. html_nodes
will do what you want.
From the documentation:
html_node
is like[[
it always extracts exactly one element. When given a list of nodes,html_node
will always return a list of the same length, the length ofhtml_nodes
might be longer or shorter.
Upvotes: 1