Pedro Guizar
Pedro Guizar

Reputation: 367

How to scrape NBA data?

I want to compare rookies across leagues with stats like Points per game (PPG) and such. ESPN and NBA have great tables to scrape from (as does Basketball-reference), but I just found out that they're not stored in html, so I can't use rvest. For context, I'm trying to scrape tables like this one (from NBA):

https://i.sstatic.net/SdKjE.png

I'm trying to learn how to use HTTR and JSON for this, but I'm running into some issues. I followed the answer in this post, but it's not working out for me.

This is what I've tried:

library(httr)
library(jsonlite)
coby.white <- GET('https://www.nba.com/players/coby/white/1629632')
out <- content(coby.white, as = "text") %>%
  fromJSON(flatten = FALSE)

However, I get an error:

Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html><html class="" l
                     (right here) ------^

Is there an easier way to scrape a table from ESPN or NBA, or is there a solution to this issue?

Upvotes: 1

Views: 2286

Answers (2)

Robert Frey
Robert Frey

Reputation: 23

You actually can web scrape with rvest, here's an example of scraping White's totals table from Basketball Reference. Anything on Sports Reference's sites that is not the first table of the page is listed as a comment, meaning we must extract the comment nodes first then extract the desired data table.

library(rvest)
library(dplyr)

cobywhite = 'https://www.basketball-reference.com/players/w/whiteco01.html'

totalsdf =  cobywhite %>%
read_html %>%
html_nodes(xpath = '//comment()') %>%
html_text() %>%
paste(collapse='') %>%
read_html() %>% 
html_node("#totals") %>% 
html_table()

Upvotes: 1

QHarr
QHarr

Reputation: 84465

ppg and others stats come from]

https://data.nba.net/prod/v1/2019/players/1629632_profile.json

and player info e.g. weight, height

https://www.nba.com/players/active_players.json

So, you could use jsonlite to parse e.g.

library(jsonlite)

data <- jsonlite::read_json('https://data.nba.net/prod/v1/2019/players/1629632_profile.json')

You can find these in the network tab when refreshing the page. Looks like you can use the player id in the url to get different players info for the season.

Upvotes: 1

Related Questions