Reputation: 13
I'd like to make a list out of every election date listed here: https://voterportal.sos.la.gov/static/ so that I can then travel to each respective election site and download and compile the spreadsheets called "Excel - Complete Results"
Normally I'd go about this by using Rvest to get every date listed on the linked site and then map over the dates to get to each elections sites (just the election date appended to the parent site url like: "https://voterportal.sos.la.gov/static/2022-04-30") and then read in the excels that are linked in the election sites but I'm running into a problem with html_elements
that I haven't encountered before:
I tried to use html_elements
to pull the dates:
la_elections_url <- "https://voterportal.sos.la.gov/static/"
la_elections_text <- read_html(la_elections_url)
la_elections_text %>% html_element("a")
Which I thought I'd be able to filter to the href
attributes like:
html_attr(html_nodes(la_elections_text, "a"), "href") %>% as.list()
To get a list of the election dates but I get the warning:
la_elections_text %>% html_element("a")
{xml_missing}
<NA>
Upvotes: 1
Views: 171
Reputation: 7405
This website uses XHR to load data, which makes using rvest
based on the DOM a bit trickier. Luckily, you can use DevTools to grab the URL to fetch all of the data yourself:
Using httr
, this becomes pretty easy:
library(httr)
library(tidyverse)
res <- httr::GET('https://voterportal.sos.la.gov/ElectionResults/ElectionResults/Data?blob=ElectionDates.htm')
res_list <- httr::content(res)
res_list$Dates$Date %>%
purrr::map( ~ {
.x$ElectionDate
})
Which gives you:
[[1]]
[1] "04/29/2023"
[[2]]
[1] "03/25/2023"
[[3]]
[1] "02/18/2023"
[[4]]
[1] "01/14/2023"
[[5]]
[1] "12/10/2022"
.....
Upvotes: 1