Nate
Nate

Reputation: 11

scraping data from table on website using xpath

I'm trying to get data from the table with all the colors from the following website: https://azdeq.gov/aq/ytd?year=2022&pollutant=pm25&location=pinal&type=conc#mtop

Here's what I've done.

  1. Inspected element & found table
  2. Copied table's XPath: //*[@id="node-5748"]/div/div/div/div/div[5]
  3. Spent way more time on this simple bit of code than I was hoping
  4. Table is empty...same results using css & selector
  5. I've used other methods to get access to some of the data, but the blanks are not showing up and throwing things off.

Any help is much appreciated.

library(rvest)

# Scrape the table from the website
table <- read_html("https://azdeq.gov/aq/ytd?year=2022&pollutant=pm25&location=pinal&type=conc#mtop") %>%
  html_nodes(xpath='//*[@id="node-5748"]/div/div/div/div/div[5]') %>%
  html_table()

Upvotes: 0

Views: 61

Answers (1)

MrFlick
MrFlick

Reputation: 206401

The problem is the data isn't stored in an actual HTML table. It's stored in a bunch of div tags so it seems the html_table() can't parse that data. You could do a bit of your own processing. For example

library(rvest)
page <-read_html("https://azdeq.gov/aq/ytd?year=2022&pollutant=pm25&location=pinal&type=conc#mtop")
block <- html_nodes(page, "div.divPollYTD") %>% `[[`(2)

lapply(block %>% html_elements(".divPollRowYTD"), function(row)
  row %>% html_elements("div") %>% html_text()
) |> 
  do.call("rbind", args=_)
#       [,1]  [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]   [,10]  [,11]  [,12]  [,13] 
#  [1,] "Day" "Jan"  "Feb"  "Mar"  "Apr"  "May"  "Jun"  "Jul"  "Aug"  "Sep"  "Oct"  "Nov"  "Dec" 
#  [2,] "1"   "7.1"  "11.2" "6.9"  "5.7"  "13"   "20.3" "8.7"  "3.4"  "13.2" "3"    "8.4"  "7"   
#  [3,] "2"   "6.5"  "10.3" "15.1" "5.6"  "14.7" "18.9" "13.2" "3.7"  "15.2" "4.9"  "13.5" "8.2" 
#  [4,] "3"   "6.2"  "11"   "10.9" "5.3"  "12.4" "14.7" "7.6"  "5.1"  "3.5"  "57.8" "7.7"  "7.1" 
#  [5,] "4"   "8.3"  "11.7" "6.7"  "6.7"  "7.4"  "11.2" "10.5" "2.2"  "10.5" "4.9"  "6.9"  "3.7" 
#  [6,] "5"   "13.6" "7.1"  "9.4"  "6.8"  "16"   "8.9"  "7.2"  "4"    "19.5" "6.6"  "9.5"  "3.4" 
#   etc...

This returns a character array but you could coerce that to a data.frame or whatever else you like.

Upvotes: 2

Related Questions