as21
as21

Reputation: 13

In Rvest, how can I make a selection on a checkbox to access different content on a page?

I am trying to use rvest to scrape from the following URL:

https://www.sportsline.com/mlb/odds/money-line/

However, when I go to access this page, I am being redirected to the a different URL:

https://www.sportsline.com/mlb/odds/

In order to access the page I'm targeting, I need to make a selection of a checkbox.

Is it possible to do this using Rvest? I attempted a solution using Rselenium though ran into issues there as well.

The code below is what I am currently using.

url = 'https://www.sportsline.com/mlb/odds/money-line/'
page <- read_html(url)
GameTime = page %>% html_nodes(".game-details") %>% html_text()
VisTm = page %>% html_nodes(".away-team .cfYQTQ") %>% html_text()
HmTm = page %>% html_nodes(".home-team .cfYQTQ") %>% html_text()
Away_Odds = page %>% html_nodes(".away-team .projected-score+ td .primary") %>% html_text()
Close = page %>% html_nodes(".home-team .projected-score+ td .primary") %>% html_text()

The issue in this code is that although the URL is targeted at the '/money-line/' page, Rvest is being redirected and it is returning the text from the wrong page.

Any thoughts? Thanks.

Upvotes: 0

Views: 95

Answers (1)

margusl
margusl

Reputation: 17504

Initial state, along with data for Money Line table, is embedded in <script id="__NEXT_DATA__" type="application/json"> ... </script> as JSON. Here's some manual rectangling:

library(jsonlite)
library(rvest)
library(dplyr)
library(tidyr)
library(purrr)

next_data <- read_html("https://www.sportsline.com/mlb/odds/") %>% 
  html_elements("#__NEXT_DATA__") %>% html_text()


odds <- parse_json(next_data) %>% 
  pluck("props", "initialState", "oddsPageState", "pageState", "data", "competitionOdds") %>% 
  tibble(co = .) %>% 
  hoist(co, home = list("homeTeam", "nickName"),
            away = list("awayTeam", "nickName"),
            "venueCity", "venueName", "startDate") %>% 
  hoist(co, "odds") %>% 
  unnest_longer(odds) %>% 
  unnest_wider(odds) %>% 
  hoist(odd, "moneyLine") %>% 
  unnest_wider(moneyLine) %>% 
  select(home:isBestHomeLine)

Result :

glimpse(odds)
#> Rows: 50
#> Columns: 12
#> $ home            <chr> "Tigers", "Tigers", "Tigers", "Tigers", "Tigers", "Phi…
#> $ away            <chr> "Orioles", "Orioles", "Orioles", "Orioles", "Orioles",…
#> $ venueCity       <chr> "Lakeland", "Lakeland", "Lakeland", "Lakeland", "Lakel…
#> $ venueName       <chr> "Publix Field at Joker Marchant Stadium", "Publix Fiel…
#> $ startDate       <chr> "2023-02-26T18:05:00.000Z", "2023-02-26T18:05:00.000Z"…
#> $ sportsbookName  <chr> "consensus", "whnj", "draftkings", "fanduel", "bet365n…
#> $ currentHomeOdds <chr> "-133", "-140", "-130", "", "-130", "-130", "-130", "-…
#> $ currentAwayOdds <chr> "+112", "+118", "+110", "", "+110", "+109", "+110", "+…
#> $ openingHomeOdds <chr> "-123", "-120", "-120", "", "-120", "-134", "-135", "-…
#> $ openingAwayOdds <chr> "+103", "+100", "+100", "", "+100", "+113", "+115", "+…
#> $ isBestAwayLine  <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, F…
#> $ isBestHomeLine  <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…

odds
#> # A tibble: 50 × 12
#>    home    away  venue…¹ venue…² start…³ sport…⁴ curre…⁵ curre…⁶ openi…⁷ openi…⁸
#>    <chr>   <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 Tigers  Orio… Lakela… Publix… 2023-0… consen… "-133"  "+112"  "-123"  "+103" 
#>  2 Tigers  Orio… Lakela… Publix… 2023-0… whnj    "-140"  "+118"  "-120"  "+100" 
#>  3 Tigers  Orio… Lakela… Publix… 2023-0… draftk… "-130"  "+110"  "-120"  "+100" 
#>  4 Tigers  Orio… Lakela… Publix… 2023-0… fanduel ""      ""      ""      ""     
#>  5 Tigers  Orio… Lakela… Publix… 2023-0… bet365… "-130"  "+110"  "-120"  "+100" 
#>  6 Philli… Twins Clearw… BayCar… 2023-0… consen… "-130"  "+109"  "-134"  "+113" 
#>  7 Philli… Twins Clearw… BayCar… 2023-0… whnj    "-130"  "+110"  "-135"  "+115" 
#>  8 Philli… Twins Clearw… BayCar… 2023-0… draftk… "-130"  "+110"  "-135"  "+115" 
#>  9 Philli… Twins Clearw… BayCar… 2023-0… fanduel ""      ""      ""      ""     
#> 10 Philli… Twins Clearw… BayCar… 2023-0… bet365… "-125"  "+105"  "-135"  "+115" 
#> # … with 40 more rows, 2 more variables: isBestAwayLine <lgl>,
#> #   isBestHomeLine <lgl>, and abbreviated variable names ¹​venueCity,
#> #   ²​venueName, ³​startDate, ⁴​sportsbookName, ⁵​currentHomeOdds,
#> #   ⁶​currentAwayOdds, ⁷​openingHomeOdds, ⁸​openingAwayOdds

Created on 2023-02-26 with reprex v2.0.2

Upvotes: 1

Related Questions