Reputation: 15
I am using this code to loop through multiple url's to scrape data. The code works fine until it comes to a date that has missing data. This is the error message that pops up:
Error in data.frame(away, home, away1H, home1H, awayPinnacle, homePinnacle) : arguments imply differing number of rows: 7, 8
I am very new to coding and could not figure out how to make it keep scraping despite the missing data.
library(rvest)
library(dplyr)
get_data <- function(date) {
# Specifying URL
url <- paste0('https://classic.sportsbookreview.com/betting-odds/nba-basketball/money-line/1st-half/?date=', date)
# Reading the HTML code from website
oddspage <- read_html(url)
# Using CSS selectors to scrape away teams
awayHtml <- html_nodes(oddspage,'.eventLine-value:nth-child(1) a')
#Using CSS selectors to scrape 1Q scores
away1QHtml <- html_nodes(oddspage,'.current-score+ .first')
away1Q <- html_text(away1QHtml)
away1Q <- as.numeric(away1Q)
home1QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .current-score+ .period')
home1Q <- html_text(home1QHtml)
home1Q <- as.numeric(home1Q)
#Using CSS selectors to scrape 2Q scores
away2QHtml <- html_nodes(oddspage,'.first:nth-child(3)')
away2Q <- html_text(away2QHtml)
away2Q <- as.numeric(away2Q)
home2QHtml <- html_nodes(oddspage,'.score-periods+ .score-periods .period:nth-child(3)')
home2Q <- html_text(home2QHtml)
home2Q <- as.numeric(home2Q)
#Creating First Half Scores
away1H <- away1Q + away2Q
home1H <- home1Q + home2Q
#Using CSS selectors to scrape scores
awayScoreHtml <- html_nodes(oddspage,'.first.total')
awayScore <- html_text(awayScoreHtml)
awayScore <- as.numeric(awayScore)
homeScoreHtml <- html_nodes(oddspage, '.score-periods+ .score-periods .total')
homeScore <- html_text(homeScoreHtml)
homeScore <- as.numeric(homeScore)
# Converting away data to text
away <- html_text(awayHtml)
# Using CSS selectors to scrape home teams
homeHtml <- html_nodes(oddspage,'.eventLine-value+ .eventLine-value a')
# Converting home data to text
home <- html_text(homeHtml)
# Using CSS selectors to scrape Away Odds
awayPinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value:nth-child(1) b')
# Converting Away Odds to Text
awayPinnacle <- html_text(awayPinnacleHtml)
# Converting Away Odds to numeric
awayPinnacle <- as.numeric(awayPinnacle)
# Using CSS selectors to scrape Pinnacle Home Odds
homePinnacleHtml <- html_nodes(oddspage,'.eventLine-consensus+ .eventLine-book .eventLine-book-value+ .eventLine-book-value b')
# Converting Home Odds to Text
homePinnacle <- html_text(homePinnacleHtml)
# Converting Home Odds to Numeric
homePinnacle <- as.numeric(homePinnacle)
# Create Data Frame
df <- data.frame(away,home,away1H,home1H,awayPinnacle,homePinnacle)
}
date_vec <- sprintf('201902%02d', 02:06)
all_data <- do.call(rbind, lapply(date_vec, get_data))
View(all_data)
Upvotes: 1
Views: 92
Reputation: 21264
I'd recommending purrr::map()
instead of lapply
. Then you can wrap your call to get_data()
with possibly()
, which is a nice way to catch errors and keep going.
library(purrr)
map_dfr(date_vec, possibly(get_data, otherwise = data.frame()))
Output:
away home away1H home1H awayPinnacle homePinnacle
1 L.A. Clippers Detroit 47 65 116 -131
2 Milwaukee Washington 73 50 -181 159
3 Chicago Charlotte 60 51 192 -220
4 Brooklyn Orlando 48 44 121 -137
5 Indiana Miami 53 54 117 -133
6 Dallas Cleveland 58 55 -159 140
7 L.A. Lakers Golden State 58 63 513 -651
8 New Orleans San Antonio 50 63 298 -352
9 Denver Minnesota 61 64 107 -121
10 Houston Utah 63 50 186 -213
11 Atlanta Phoenix 58 57 110 -125
12 Philadelphia Sacramento 52 62 -139 123
13 Memphis New York 42 41 -129 114
14 Oklahoma City Boston 58 66 137 -156
15 L.A. Clippers Toronto 51 65 228 -263
16 Atlanta Washington 61 57 172 -196
17 Denver Detroit 55 68 -112 -101
18 Milwaukee Brooklyn 51 42 -211 184
19 Indiana New Orleans 53 50 -143 127
20 Houston Phoenix 63 57 -256 222
21 San Antonio Sacramento 59 63 -124 110
Upvotes: 2