Ricardo
Ricardo

Reputation: 1

Web Scraping with rvest package don't work

I'm trying to get a table with rvest but it doesn't recognize the numbers and creates two extra columns with NAs

A few months ago it worked, but apparently they made changes to the website and now it doesn't work.I do not know what the problem may be.

  url <- paste0("https://climatologia.meteochile.gob.cl/application/mensual/temperaturaMediaMensual/170007/2021/08")
tmp <- read_html(url)
tmp <- html_nodes(tmp,"table")
sapply(tmp, function(x) dim(html_table(x, fill = TRUE))) ## ver tabla con datos
tabla <- html_table(tmp[1],fill = T,header=NA, dec = ".")

Upvotes: 0

Views: 98

Answers (1)

QHarr
QHarr

Reputation: 84465

I don't see a problem with recognising numbers. There are two empty columns in the html, hence the NAs, and most of the table is blank.

As there are repeat headers, I use janitor to clean the headers, then dplyr to remove the end columns which are automatically labelled x and x_2. You could also slice end columns off instead.

I would probably consider removing/putting into separate table the Resumen Mensual part of the current table.

library(rvest)
library(janitor)
library(dplyr)

url <- paste0("https://climatologia.meteochile.gob.cl/application/mensual/temperaturaMediaMensual/170007/2021/08")

t <- read_html(url) |> 
  html_element('#excel > table') |>
  html_table() |>  
  clean_names() |> 
  select(!starts_with('x'))

t

The new base pipe |> requires R 4.1.0. You can replace with %>% pipe from magrittr

Upvotes: 1

Related Questions