Michael M
Michael M

Reputation: 57

Scraping NBA.com Individual Player Matchups Head to Head Stats Page (covering multiple pages)

I am attempting to scrape the following page using Python (currently trying to use Requests & BeautifulSoup) but struggling to obtain a) meaningful results in a tabular format and b) scrape from each page as most player's data covers various pages (e.g., the following player had data spanning 7 pages: https://www.nba.com/stats/player/203081/head-to-head/ )

At the moment, I've been able to run a succesful GET & SOUP request but am unsure the best way to proceed. Any help/suggestions/recommendations are greatly appreciated.

url = 'https://www.nba.com/stats/player/203081/head-to-head/'
r = requests.get(url)
if r.status_code==200:
    soup = BeautifulSoup(r.content, 'html.parser')
    print(soup)
    table = soup.find('table')
    if table:
        df = pd.read_html(str(table))[0]
        print(df)

Upvotes: 0

Views: 237

Answers (1)

Paul M.
Paul M.

Reputation: 10809

I visited the page in my browser and logged my network traffic, and saw that my browser made several HTTP GET requests to REST APIs. One of them has the endpoint stats/leagueseasonmatchups, which you can query with a specific player, league and season. The response is JSON which contains all the table information you're trying to scrape. Normally, this API is used by the page to populate the DOM asynchronously using JavaScript. Since we know the endpoint, query-string parameters and request headers, we can imitate that HTTP GET request, parse the response, and write it to a CSV:

def get_matchups():

    import requests

    url = "https://stats.nba.com/stats/leagueseasonmatchups"

    params = {
        "DateFrom": "",
        "DateTo": "",
        "DefPlayerID": "203081",
        "LeagueID": "00",
        "Outcome": "",
        "PORound": "0",
        "PerMode": "Totals",
        "Season": "2020-21",
        "SeasonType": "Regular Season"
    }

    headers = {
        "Accept": "application/json",
        "Accept-Encoding": "gzip, deflate",
        "Referer": "https://www.nba.com/",
        "User-Agent": "Mozilla/5.0",
        "x-nba-stats-origin": "stats",
        "x-nba-stats-token": "true"
    }

    print("Getting matchups for player ID# {}...".format(params["DefPlayerID"]))

    response = requests.get(url, params=params, headers=headers)
    response.raise_for_status()

    data = response.json()
    
    fieldnames = data["resultSets"][0]["headers"]

    for row in data["resultSets"][0]["rowSet"]:
        yield dict(zip(fieldnames, row))

def main():

    from csv import DictWriter

    all_matchups = list(get_matchups())

    print("Writing to CSV file...")

    with open("output.csv", "w", newline="") as file:
        fieldnames = list(all_matchups[0]) # a bit lame
        writer = DictWriter(file, fieldnames=fieldnames)

        writer.writeheader()
        for matchup in all_matchups:
            writer.writerow(matchup)

    print("Done.")

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

Output (Terminal):

Getting matchups for player ID# 203081...
Writing to CSV file...
Done.
>>> 

Output (CSV):

SEASON_ID,OFF_PLAYER_ID,OFF_PLAYER_NAME,DEF_PLAYER_ID,DEF_PLAYER_NAME,GP,MATCHUP_MIN,PARTIAL_POSS,PLAYER_PTS,TEAM_PTS,MATCHUP_AST,MATCHUP_TOV,MATCHUP_BLK,MATCHUP_FGM,MATCHUP_FGA,MATCHUP_FG_PCT,MATCHUP_FG3M,MATCHUP_FG3A,MATCHUP_FG3_PCT,HELP_BLK,HELP_FGM,HELP_FGA,HELP_FG_PERC,MATCHUP_FTM,MATCHUP_FTA,SFL
22020,202709,Cory Joseph,203081,Damian Lillard,5,17:34,68.6,4,82,1,1,0,2,10,0.2,0,3,0.0,0,0,0,0.0,0,0,0
22020,1628969,Mikal Bridges,203081,Damian Lillard,3,17:28,68.36,18,98,4,1,0,7,8,0.875,3,4,0.75,0,0,0,0.0,1,1,1
22020,1628366,Lonzo Ball,203081,Damian Lillard,3,16:34,65.98,17,77,6,2,1,6,13,0.462,5,11,0.455,0,0,0,0.0,0,0,0
22020,1626220,Royce O'Neale,203081,Damian Lillard,3,14:17,51.4,2,77,0,1,0,1,6,0.167,0,4,0.0,0,0,0,0.0,0,0,0
22020,1626196,Josh Richardson,203081,Damian Lillard,3,11:39,47.9,6,80,2,1,0,2,4,0.5,1,1,1.0,0,0,0,0.0,1,1,1
...

Upvotes: 1

Related Questions