Reputation: 13
I am trying to use rvest to scrape from the following URL:
https://www.sportsline.com/mlb/odds/money-line/
However, when I go to access this page, I am being redirected to the a different URL:
https://www.sportsline.com/mlb/odds/
In order to access the page I'm targeting, I need to make a selection of a checkbox.
Is it possible to do this using Rvest? I attempted a solution using Rselenium though ran into issues there as well.
The code below is what I am currently using.
url = 'https://www.sportsline.com/mlb/odds/money-line/'
page <- read_html(url)
GameTime = page %>% html_nodes(".game-details") %>% html_text()
VisTm = page %>% html_nodes(".away-team .cfYQTQ") %>% html_text()
HmTm = page %>% html_nodes(".home-team .cfYQTQ") %>% html_text()
Away_Odds = page %>% html_nodes(".away-team .projected-score+ td .primary") %>% html_text()
Close = page %>% html_nodes(".home-team .projected-score+ td .primary") %>% html_text()
The issue in this code is that although the URL is targeted at the '/money-line/' page, Rvest is being redirected and it is returning the text from the wrong page.
Any thoughts? Thanks.
Upvotes: 0
Views: 95
Reputation: 17504
Initial state, along with data for Money Line table, is embedded in <script id="__NEXT_DATA__" type="application/json"> ... </script>
as JSON. Here's some manual rectangling:
library(jsonlite)
library(rvest)
library(dplyr)
library(tidyr)
library(purrr)
next_data <- read_html("https://www.sportsline.com/mlb/odds/") %>%
html_elements("#__NEXT_DATA__") %>% html_text()
odds <- parse_json(next_data) %>%
pluck("props", "initialState", "oddsPageState", "pageState", "data", "competitionOdds") %>%
tibble(co = .) %>%
hoist(co, home = list("homeTeam", "nickName"),
away = list("awayTeam", "nickName"),
"venueCity", "venueName", "startDate") %>%
hoist(co, "odds") %>%
unnest_longer(odds) %>%
unnest_wider(odds) %>%
hoist(odd, "moneyLine") %>%
unnest_wider(moneyLine) %>%
select(home:isBestHomeLine)
Result :
glimpse(odds)
#> Rows: 50
#> Columns: 12
#> $ home <chr> "Tigers", "Tigers", "Tigers", "Tigers", "Tigers", "Phi…
#> $ away <chr> "Orioles", "Orioles", "Orioles", "Orioles", "Orioles",…
#> $ venueCity <chr> "Lakeland", "Lakeland", "Lakeland", "Lakeland", "Lakel…
#> $ venueName <chr> "Publix Field at Joker Marchant Stadium", "Publix Fiel…
#> $ startDate <chr> "2023-02-26T18:05:00.000Z", "2023-02-26T18:05:00.000Z"…
#> $ sportsbookName <chr> "consensus", "whnj", "draftkings", "fanduel", "bet365n…
#> $ currentHomeOdds <chr> "-133", "-140", "-130", "", "-130", "-130", "-130", "-…
#> $ currentAwayOdds <chr> "+112", "+118", "+110", "", "+110", "+109", "+110", "+…
#> $ openingHomeOdds <chr> "-123", "-120", "-120", "", "-120", "-134", "-135", "-…
#> $ openingAwayOdds <chr> "+103", "+100", "+100", "", "+100", "+113", "+115", "+…
#> $ isBestAwayLine <lgl> FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, F…
#> $ isBestHomeLine <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
odds
#> # A tibble: 50 × 12
#> home away venue…¹ venue…² start…³ sport…⁴ curre…⁵ curre…⁶ openi…⁷ openi…⁸
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Tigers Orio… Lakela… Publix… 2023-0… consen… "-133" "+112" "-123" "+103"
#> 2 Tigers Orio… Lakela… Publix… 2023-0… whnj "-140" "+118" "-120" "+100"
#> 3 Tigers Orio… Lakela… Publix… 2023-0… draftk… "-130" "+110" "-120" "+100"
#> 4 Tigers Orio… Lakela… Publix… 2023-0… fanduel "" "" "" ""
#> 5 Tigers Orio… Lakela… Publix… 2023-0… bet365… "-130" "+110" "-120" "+100"
#> 6 Philli… Twins Clearw… BayCar… 2023-0… consen… "-130" "+109" "-134" "+113"
#> 7 Philli… Twins Clearw… BayCar… 2023-0… whnj "-130" "+110" "-135" "+115"
#> 8 Philli… Twins Clearw… BayCar… 2023-0… draftk… "-130" "+110" "-135" "+115"
#> 9 Philli… Twins Clearw… BayCar… 2023-0… fanduel "" "" "" ""
#> 10 Philli… Twins Clearw… BayCar… 2023-0… bet365… "-125" "+105" "-135" "+115"
#> # … with 40 more rows, 2 more variables: isBestAwayLine <lgl>,
#> # isBestHomeLine <lgl>, and abbreviated variable names ¹venueCity,
#> # ²venueName, ³startDate, ⁴sportsbookName, ⁵currentHomeOdds,
#> # ⁶currentAwayOdds, ⁷openingHomeOdds, ⁸openingAwayOdds
Created on 2023-02-26 with reprex v2.0.2
Upvotes: 1