Reputation: 111
I am fairly new to Python. I am trying to scrape NBA Drives data via https://stats.nba.com/players/drives/
I used Chrome Devtools to find the API URL. I then used the requests package to get the JSON string.
Original code:
import requests
headers = {"User-Agent": "Mozilla/5.0..."}
url = " https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
r = requests.get(url, headers = headers)
d = r.json()
This no longer works, however. For some reason the request for the URL link below times out on the NBA server. So I need to find a new way to get this information.
I was exploring Chrome devtools and I found out that the desired JSON string was stored in the Network XHR Response tab. Is there any way to scrape that into python. See the image below.
Chrome Devtools: XHR Response JSON string
Upvotes: 0
Views: 471
Reputation: 142793
I tested url with other headers (which I saw in DevTool
for this request) and it seems it needs header Referer
to work correctly
EDIT 2020.08.15:
I had to add new headers to read it
'x-nba-stats-origin': 'stats',
'x-nba-stats-token': 'true',
import requests
headers = {
'User-Agent': 'Mozilla/5.0',
#'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0',
'Referer': 'https://stats.nba.com/players/drives/',
#'Accept': 'application/json, text/plain, */*',
'x-nba-stats-origin': 'stats',
'x-nba-stats-token': 'true',
}
url = 'https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight='
r = requests.get(url, headers=headers)
data = r.json()
print(data)
BTW: the same but with params as dictionary so it is easier to set different value
import requests
headers = {
'User-Agent': 'Mozilla/5.0',
#'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0',
'Referer': 'https://stats.nba.com/players/drives/',
#'Accept': 'application/json, text/plain, */*',
'x-nba-stats-origin': 'stats',
'x-nba-stats-token': 'true',
}
url = 'https://stats.nba.com/stats/leaguedashptstats'
params = {
'College': '',
'Conference': '',
'Country': '',
'DateFrom': '',
'DateTo': '',
'Division': '',
'DraftPick': '',
'DraftYear': '',
'GameScope': '',
'Height': '',
'LastNGames': '0',
'LeagueID': '00',
'Location': '',
'Month': '0',
'OpponentTeamID': '0',
'Outcome': '',
'PORound': '0',
'PerMode': 'PerGame',
'PlayerExperience': '',
'PlayerOrTeam': 'Player',
'PlayerPosition': '',
'PtMeasureType': 'Drives',
'Season': '2019-20',
'SeasonSegment': '',
'SeasonType': 'Regular Season',
'StarterBench': '',
'TeamID': '0',
'VsConference': '',
'VsDivision': '',
'Weight': '',
}
r = requests.get(url, headers=headers, params=params)
#print(r.request.url)
data = r.json()
print(data)
Upvotes: 2