Able Archer
Able Archer

Reputation: 569

How to web scrape the starting lineup for the NBA?

I am new to web scraping and could use some help. I would like to scrape the NBA's starting lineup, the teams and the player's positions using Xpath. I only starting on the names because I was running into an issue.

Here is my code so far:

from urllib.request import urlopen
from lxml.html import fromstring 


url = "https://www.lineups.com/nba/lineups"

content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)


for nba, bball_row in enumerate(tree.xpath('//tr[contains(@class,"t-content")]')):
    names = bball_row.xpath('.//span[@_ngcontent-c5="long-player-name"]/text()')[0]
    print(names)

It looks like the program runs without error but the names do not print. Any tips on how to parse using Xpath more efficiently would be greatly appreciated. I tried messing with Xpath helper and Xpath Finder. Maybe there are some tricks on there in order to make the process easier. Thanks in advance for your time and effort!

Upvotes: 3

Views: 1386

Answers (1)

Andersson
Andersson

Reputation: 52675

Required content located inside script node that looks like

<script nonce="STATE_TRANSFER_TOKEN">window['TRANSFER_STATE'] = {...}</script>

You can try to do following to extract data as simple Python dictionary:

import re
import json
import requests

source = requests.get("https://www.lineups.com/nba/lineups").text
dictionary = json.loads(re.search(r"window\['TRANSFER_STATE'\]\s=\s(\{.*\})<\/script>", source).group(1))

Optionally: Paste the output of dictionary here and click "Beautify" to see data as readable JSON

Then you can access required value by key, e.g.

for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['home_players']:
    print(player['name'])

Kyrie Irving
Jaylen Brown
Jayson Tatum
Gordon Hayward
Al Horford

for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['away_players']:
    print(player['name'])

D.J. Augustin
Evan Fournier
Jonathan Isaac
Aaron Gordon
Nikola Vucevic

Update

I guess I just made it overcomplicated :)

It should be as simple as below:

import requests

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for player in source['data'][0]['away_players']:
        print(player['name'])

Update 2

To get all teams lineups use below:

import requests

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()

for team in source['data']:
    print("\n%s players\n" % team['home_route'].capitalize())
    for player in team['home_players']:
        print(player['name'])
    print("\n%s players\n" % team['away_route'].capitalize())
    for player in team['away_players']:
        print(player['name'])

Upvotes: 4

Related Questions