OGC
OGC

Reputation: 274

R - Using SelectorGadget to grab a dataset

I am trying to grab Hawaii-specific data from this site: https://www.opentable.com/state-of-industry. I want to get the data for Hawaii from every table on the site. This is done after selecting the State tab.

In R, I am trying to use rvest library with SelectorGadget.

So far I've tried

library(rvest)
html <- read_html("https://www.opentable.com/state-of-industry") 
    
html %>% 
html_element("tbody") %>% 
      html_table() 

However, this isn't giving me what I am looking for yet. I am getting the Global dataset instead in a tibble. So any suggestions on how grab the Hawaii dataset from the State tab?

Also, is there a way to download the dataset that clicks on Download dataset tab? I can also then work from the csv file.

Upvotes: 0

Views: 247

Answers (1)

QHarr
QHarr

Reputation: 84465

All the page data is stored in a script tag where it is pulled from dynamically in the browser. You can regex out the JavaScript object containing all the data, and write a custom function to extract just the info for Hawaii as shown below. Function get_state_index is written to accept a state argument, in case you wish to view other states' information.

library(rvest)
library(jsonlite)
library(magrittr)
library(stringr)
library(purrr)
library(dplyr)

get_state_index <- function(states, state) {
  return(match(T, map(states, ~ {
    .x$name == state
  })))
}

s <- read_html("https://www.opentable.com/state-of-industry") %>% html_text()
all_data <- jsonlite::parse_json(stringr::str_match(s, "__INITIAL_STATE__ = (.*?\\});w\\.")[, 2])
fullbook <- all_data$covidDataCenter$fullbook

hawaii_dataset <- tibble(
  date = fullbook$headers %>% unlist() %>%  as.Date(),
  yoy = fullbook$states[get_state_index(fullbook$states, "Hawaii")][[1]]$yoy %>% unlist()
)

Regex:

enter image description here

Upvotes: 1

Related Questions