Reputation: 11
I'm trying to get data from the table with all the colors from the following website: https://azdeq.gov/aq/ytd?year=2022&pollutant=pm25&location=pinal&type=conc#mtop
Here's what I've done.
Any help is much appreciated.
library(rvest)
# Scrape the table from the website
table <- read_html("https://azdeq.gov/aq/ytd?year=2022&pollutant=pm25&location=pinal&type=conc#mtop") %>%
html_nodes(xpath='//*[@id="node-5748"]/div/div/div/div/div[5]') %>%
html_table()
Upvotes: 0
Views: 61
Reputation: 206401
The problem is the data isn't stored in an actual HTML table. It's stored in a bunch of div tags so it seems the html_table()
can't parse that data. You could do a bit of your own processing. For example
library(rvest)
page <-read_html("https://azdeq.gov/aq/ytd?year=2022&pollutant=pm25&location=pinal&type=conc#mtop")
block <- html_nodes(page, "div.divPollYTD") %>% `[[`(2)
lapply(block %>% html_elements(".divPollRowYTD"), function(row)
row %>% html_elements("div") %>% html_text()
) |>
do.call("rbind", args=_)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
# [1,] "Day" "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
# [2,] "1" "7.1" "11.2" "6.9" "5.7" "13" "20.3" "8.7" "3.4" "13.2" "3" "8.4" "7"
# [3,] "2" "6.5" "10.3" "15.1" "5.6" "14.7" "18.9" "13.2" "3.7" "15.2" "4.9" "13.5" "8.2"
# [4,] "3" "6.2" "11" "10.9" "5.3" "12.4" "14.7" "7.6" "5.1" "3.5" "57.8" "7.7" "7.1"
# [5,] "4" "8.3" "11.7" "6.7" "6.7" "7.4" "11.2" "10.5" "2.2" "10.5" "4.9" "6.9" "3.7"
# [6,] "5" "13.6" "7.1" "9.4" "6.8" "16" "8.9" "7.2" "4" "19.5" "6.6" "9.5" "3.4"
# etc...
This returns a character array but you could coerce that to a data.frame or whatever else you like.
Upvotes: 2