johankent30
johankent30

Reputation: 75

Scraping json webpage

I am very new to web scraping and am having some trouble with scraping some NBA player data from nba.com. I first tried to scrape the page using bs4 but ran into an issue which after some research I believe is due to "XHR" from the articles I read. I was able to find a web address to the json formatted data, but my python program seems to bog down, and never load the data. Again I am very new at web scraping, but thought I'd see if I was way off track here... Any suggestions? Thanks! (Code Below)

import requests
import json

url = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="

resp = requests.get(url=url)
data = json.loads(resp.text)
print(data)

Upvotes: 0

Views: 4047

Answers (2)

SIM
SIM

Reputation: 22440

Give this a shot. It will produce all the categories from that page according to the title I've defined. Btw, you didn't get response in the first place with your initial try cause the webpage was expecting a User-Agent within your request to make sure that the request is not coming from a bot rather from any real browser. However, I faked it and found the response.

import requests

url = "http://stats.nba.com/stats/leaguedashplayerstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&GameSegment=&Height=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
resp = requests.get(url,headers={'User-Agent':'Mozilla/5.0'})
data = resp.json()

storage = data['resultSets']
for elem in storage:
    all_list = elem['rowSet']

    for item in all_list:
        Player_Id = item[0]
        Player_name = item[1]
        Team_Id = item[2]
        Team_abbr = item[3]
        print("Player_Id: {} Player_name: {} Team_Id: {} Team_abbr: {}".format(
            Player_Id,Player_name,Team_Id,Team_abbr))

Upvotes: 1

johankent30
johankent30

Reputation: 75

Just realized that it is because the user agent headers are different... Once those are added it works

Upvotes: 0

Related Questions