Radim Straka
Radim Straka

Reputation: 15

Scraping JSON from an endpoint with REQUESTS

1) I want to use python for scraping a JSON file from an endpoint - (example: http://stats.nba.com/stats/boxscoreplayertrackv2/?GameID=0021700300). I've installed REQUESTS for Python and tried to download the page to a variable like that:

import requests
r = requests.get('http://stats.nba.com/stats/boxscoreplayertrackv2/?
GameID=0021700300')

I have tried this on some regular web pages and it worked but for this endpoint, it is not working. When I send the request, the Python shell basically stops working and I have to restart it. I believe this has to have a very primitive solution.

Please, how can I download it?

2) After this, I would like to take the html/JSON and remove some of the code in the beginning and some in the ending so that it will remain only piece of json which will be possible transfer to a table (Excel or db table). My ultimate objective would be to automatize this process so every day would the script downloaded some new JSONs(need only to increase number in parameter in URL), modified them and transfered them to existing table/excel incrementally.

Could you please give me a direction how to do these steps? I just want to play with some data and this looks like the best way how to get them. I'm new to Python and I have only some basics in programming so please excuse my primitive questions. I would appreciate any little advice.

Upvotes: 0

Views: 1129

Answers (1)

Dan-Dev
Dan-Dev

Reputation: 9420

It looks like you need to add a couple of headers, you can then access the json as you would normally e.g.

import requests
import json

headers={'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0', 'Accept': 'application/json; charset=utf-8'}
r = requests.get('https://stats.nba.com/stats/boxscoreplayertrackv2/?GameID=0021700300', headers=headers)

j =r.json()

for row in j['resultSets']:
    print (row['name'])
    for rowSet in row['rowSet']:
        print (rowSet)

Outputs:

PlayerStats
['0021700300', 1610612764, 'WAS', 'Washington', 203490, 'Otto Porter Jr.', 'F', '', '32:05', 4.22, 2.25, 5, 11, 16, 48, 0, 0, 28, 0, 2, 4, 0.5, 6, 14, 0.428, 0.444, 0, 0, 0.0]
['0021700300', 1610612764, 'WAS', 'Washington', 202693, 'Markieff Morris', 'F', '', '20:39', 3.81, 1.31, 3, 4, 7, 30, 0, 0, 19, 1, 1, 2, 0.5, 0, 4, 0.0, 0.167, 2, 4, 0.5]
...
['0021700300', 1610612750, 'MIN', 'Minnesota', 201952, 'Jeff Teague', '', 'DNP - Injury/Illness                    ', '0:00', 0.0, 0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0, 0, 0, 0.0, 0.0, 0, 0, 0.0]
TeamStats
['0021700300', 1610612764, 'Wizards', 'WAS', 'Washington', '240:00', 16.74, 29, 60, 89, 399, 4, 0, 287, 23, 14, 31, 0.452, 22, 52, 0.422, 0.434, 12, 21, 0.57]
['0021700300', 1610612750, 'Timberwolves', 'MIN', 'Minnesota', '240:00', 16.58, 30, 53, 83, 407, 1, 1, 296, 27, 18, 37, 0.485, 17, 48, 0.353, 0.412, 12, 18, 0.667]

Or to write to csv (for Excel):

import requests
import json
import csv

headers={'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0', 'Accept': 'application/json; charset=utf-8'}
r = requests.get('https://stats.nba.com/stats/boxscoreplayertrackv2/?GameID=0021700300', headers=headers)

j =r.json()

with open( "test.csv", 'w' ) as out_file:
    csv_w = csv.writer( out_file )
    for row in j['resultSets']:
        csv_w.writerow ([row['name']])
        for rowSet in row['rowSet']:
            csv_w.writerow (rowSet)

Upvotes: 1

Related Questions