Reputation: 1
I'm trying to get a table with rvest but it doesn't recognize the numbers and creates two extra columns with NAs
A few months ago it worked, but apparently they made changes to the website and now it doesn't work.I do not know what the problem may be.
url <- paste0("https://climatologia.meteochile.gob.cl/application/mensual/temperaturaMediaMensual/170007/2021/08")
tmp <- read_html(url)
tmp <- html_nodes(tmp,"table")
sapply(tmp, function(x) dim(html_table(x, fill = TRUE))) ## ver tabla con datos
tabla <- html_table(tmp[1],fill = T,header=NA, dec = ".")
Upvotes: 0
Views: 98
Reputation: 84465
I don't see a problem with recognising numbers. There are two empty columns in the html, hence the NAs, and most of the table is blank.
As there are repeat headers, I use janitor to clean the headers, then dplyr to remove the end columns which are automatically labelled x and x_2. You could also slice end columns off instead.
I would probably consider removing/putting into separate table the Resumen Mensual
part of the current table.
library(rvest)
library(janitor)
library(dplyr)
url <- paste0("https://climatologia.meteochile.gob.cl/application/mensual/temperaturaMediaMensual/170007/2021/08")
t <- read_html(url) |>
html_element('#excel > table') |>
html_table() |>
clean_names() |>
select(!starts_with('x'))
t
The new base pipe |> requires R 4.1.0. You can replace with %>% pipe from magrittr
Upvotes: 1