Reputation: 143
I'd like to scrape team advanced stats from stats.nba.com.
My current code to get the XHR file where the data is stored is :
library(httr)
library(jsonlite)
nba <- GET('https://stats.nba.com/stats/leaguedashteamstats?Conference=&DateFrom=11%2F12%2F2019&DateTo=&Division=&GameScope=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&TwoWay=0&VsConference=&VsDivision=')
I get the URL via these steps in Chrome: Inspect -> Network -> XHR
The code throws this error:
Error in curl::curl_fetch_memory(url, handle = handle) :
LibreSSL SSL_read: SSL_ERROR_SYSCALL, errno 60
I also tried it with custom advanced filters on the website which either result in the same error or the code running forever. I'm not that great in web scraping so I would appreciate if anyone can point out what the issue is here.
Upvotes: 1
Views: 773
Reputation: 174348
I have had a good look at this. It looks like this site goes to some lengths to prevent scraping, and won't give you the json from that url unless you provide it with cookies that are generated by a back-and-forth between your browser's javascript and their own servers. They also monitor request timings with New Relic technology and are therefore likely to block your IP if you scrape multiple pages. It wouldn't be impossible, but very, very hard.
If you are desperate for the data you could look into using the NBA API which requires a sign-up but us free to use for 1000 requests per day.
The other option is to automate a browser using RSelenium to get the html of the fully rendered pages.
Of course, if you only want this one page, you can just copy the html from your Chrome's inspector, then use rvest::read_html(readClipboard())
Upvotes: 2