rogarui
rogarui

Reputation: 97

Json problems to csv

I'm trying to get some stats from the NBA stats page. I'm following this tutorial-idea https://towardsdatascience.com/using-python-pandas-and-plotly-to-generate-nba-shot-charts-e28f873a99cb

The basic idea is put the data into a csv file.

So I try this code, to get the data from the nba web, trying to get the json file and the convert it to a csv:

import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request



shot_data_url_start="https://stats.nba.com/events/?flag=3&CFID=33&CFPARAMS=2017-18&PlayerID="
player_id="202695"
shot_data_url_end="&ContextMeasure=FGA&Season=2017-18&section=player&sct=plot"

def shoy_chart(player_id):
   full_url = shot_data_url_start + str(player_id) + shot_data_url_end
   json = requests.get(full_url, headers=headers).json()
return(json)



data = json['resultSets'][0]['rowSets']
columns = json['resultSets'][0]['headers']


df = pd.DataFrame.from_records(data, columns=columns)

And this is the error that notebook shows to me:

TypeError                                 Traceback (most recent call last)
<ipython-input-42-a3452c3a4fc8> in <module>
 18 
 19 
 ---> 20 data = json['resultSets'][0]['rowSets']
 21 columns = json['resultSets'][0]['headers']
 22 

 TypeError: 'module' object is not subscriptable

Anyone can help me, or know another way to get the data into a .csv or excel file?

Upvotes: 0

Views: 134

Answers (4)

Anthony
Anthony

Reputation: 89

Try this out

import pandas as pd
from pandas import DataFrame as df

shot_data_url_start = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id = "204001"
shot_data_url_end = "&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="

def get_shot_data(player_id):
    full_url = shot_data_url_start + player_id + shot_data_url_end
    data = requests.get(
        full_url,
        headers = {
            "User-Agent": "PostmanRuntime/7.4.0"
        }
    )
    return data.json()

shot_results = get_shot_data(player_id)

result_sets = shot_results['resultSets']

first_result_set = result_sets[0]
row_set = first_result_set['rowSet']
set_headers = first_result_set['headers']

df = pd.DataFrame.from_records(row_set, columns=set_headers)

I see how you got confused with that medium post. You were missing the headers and the url for the NBA api wasn't right. That's what @pierre was trying to say in his response. The url you're using isn't right. If you reread that post you were following, you'll see that the author said he had to dig in to dev tools in order to find that actual url to use in order to grab the JSON.

Edit: Forgot to mention that when I didn't pass a User-Agent in the headers, the request would timeout. If you don't pass that in, you won't get a successful response.

Upvotes: 0

rogarui
rogarui

Reputation: 97

import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request



shot_data_url_start="https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2019-20&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID="
player_id="202330"
shot_data_url_end="&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="

def shot_chart(player_id):
    full_url = shot_data_url_start + str(player_id) + shot_data_url_end
    response_json = requests.get(full_url).json()
    return(response_json)



    data = response_json['resultSets'][0]['rowSets']
    columns = response_json['resultSets'][0]['headers']


    df = pd.DataFrame.from_records(data, columns=columns)

shot_chart("202330")

What is going on now? the notebook is tucked right know

Upvotes: 0

rogarui
rogarui

Reputation: 97

Ok I've changed the code like this:

import requests
import json
import pandas as pd
from pandas import DataFrame as df
import urllib.request


def shot_chart(player_id):
full_url = "https://stats.nba.com/stats/shotchartdetail?AheadBehind=&CFID=33&CFPARAMS=2017-18&ClutchTime=&Conference=&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&Division=&EndPeriod=10&EndRange=28800&GROUP_ID=&GameEventID=&GameID=&GameSegment=&GroupID=&GroupMode=&GroupQuantity=5&LastNGames=0&LeagueID=00&Location=&Month=0&OnOff=&OpponentTeamID=0&Outcome=&PORound=0&Period=0&PlayerID=202695&PlayerID1=&PlayerID2=&PlayerID3=&PlayerID4=&PlayerID5=&PlayerPosition=&PointDiff=&Position=&RangeType=0&RookieYear=&Season=2017-18&SeasonSegment=&SeasonType=Regular+Season&ShotClockRange=&StartPeriod=1&StartRange=0&StarterBench=&TeamID=0&VsConference=&VsDivision=&VsPlayerID1=&VsPlayerID2=&VsPlayerID3=&VsPlayerID4=&VsPlayerID5=&VsTeamID="
    response_json = requests.get(full_url, headers=headers)
    return(response_json)



    data = response_json['resultSets'][0]['rowSets']
    columns = response_json['resultSets'][0]['headers']


    df = pd.DataFrame.from_records(data, columns=columns)

Upvotes: 0

Pierre
Pierre

Reputation: 1099

When imported with import json, the name json is referring to the JSON module of the Python standard library. You cannot use it as a regular variable name. If you rename your variable to something else such as response_json, this part of your code will work.

Regarding the rest of the code, the page https://stats.nba.com/events/ doesn't return any JSON text, it is a regular web page with images, menus, a video player, etc... If you want to access the API that returns the shots in JSON format, you will have to use the https://stats.nba.com/stats/shotchartdetail (with the right query string). This API endpoint is mentioned in the tutorial, in the "Chrome XHR tab and resulting json linked by url" image.

Upvotes: 1

Related Questions