Reputation: 4037
According to the documentation, html_nodes()
from rvest
should return (quote) When applied to a list of nodes, html_nodes() returns all nodes,
collapsing results into a new nodelist.
So, in my case, it returns a string where every node is collapsed. Why such behavior? Via debugging I was not able to get any change in that sense. It always returns the same string, where the page numbers are collapsed:
123456789101112131415...4950
library(tidyverse)
library(rvest)
library(stringr)
library(rebus)
library(lubridate)
url <-'https://footballdatabase.com/ranking/world/1'
html <read_html(url)
get_last_page <- function(html){
pages_data <- html %>%
# The '.' indicates the class
html_nodes('.pagination') %>%
# Extract the raw text as a list
html_text()
# The second to last of the buttons is the one
pages_data[(length(pages_data)-1)] %>%
unname() %>%
# Convert to number
as.numeric()
}
I also tried to enlist the output with list()
, without fortune. Also html_node()
did not solve the problem.
Upvotes: 0
Views: 139
Reputation: 34441
There is only a single node extracted with the selector '.pagination' so when html_text()
is applied all the text in that node is returned collapsed together. Change the CSS selector to include the anchors then extract the text so a vector is returned for each node separately.
html %>%
html_nodes('.pagination a') %>%
html_text()
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"
[33] "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50"
Upvotes: 1