Reputation: 75
I am very new to web scraping and am having some trouble with scraping some NBA player data from nba.com. I first tried to scrape the page using bs4 but ran into an issue which after some research I believe is due to "XHR" from the articles I read. I was able to find a web address to the json formatted data, but my python program seems to bog down, and never load the data. Again I am very new at web scraping, but thought I'd see if I was way off track here... Any suggestions? Thanks! (Code Below)
import requests
import json
url = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
resp = requests.get(url=url)
data = json.loads(resp.text)
print(data)
Upvotes: 0
Views: 4047
Reputation: 22440
Give this a shot. It will produce all the categories from that page according to the title I've defined. Btw, you didn't get response in the first place with your initial try cause the webpage was expecting a User-Agent
within your request to make sure that the request is not coming from a bot rather from any real browser. However, I faked it and found the response.
import requests
url = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
resp = requests.get(url,headers={'User-Agent':'Mozilla/5.0'})
data = resp.json()
storage = data['resultSets']
for elem in storage:
all_list = elem['rowSet']
for item in all_list:
Player_Id = item[0]
Player_name = item[1]
Team_Id = item[2]
Team_abbr = item[3]
print("Player_Id: {} Player_name: {} Team_Id: {} Team_abbr: {}".format(
Player_Id,Player_name,Team_Id,Team_abbr))
Upvotes: 1
Reputation: 75
Just realized that it is because the user agent headers are different... Once those are added it works
Upvotes: 0