Reputation: 569
I am new to web scraping and could use some help. I would like to scrape the NBA's starting lineup, the teams and the player's positions using Xpath. I only starting on the names because I was running into an issue.
Here is my code so far:
from urllib.request import urlopen
from lxml.html import fromstring
url = "https://www.lineups.com/nba/lineups"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for nba, bball_row in enumerate(tree.xpath('//tr[contains(@class,"t-content")]')):
names = bball_row.xpath('.//span[@_ngcontent-c5="long-player-name"]/text()')[0]
print(names)
It looks like the program runs without error but the names do not print. Any tips on how to parse using Xpath more efficiently would be greatly appreciated. I tried messing with Xpath helper and Xpath Finder. Maybe there are some tricks on there in order to make the process easier. Thanks in advance for your time and effort!
Upvotes: 3
Views: 1386
Reputation: 52675
Required content located inside script
node that looks like
<script nonce="STATE_TRANSFER_TOKEN">window['TRANSFER_STATE'] = {...}</script>
You can try to do following to extract data as simple Python dictionary:
import re
import json
import requests
source = requests.get("https://www.lineups.com/nba/lineups").text
dictionary = json.loads(re.search(r"window\['TRANSFER_STATE'\]\s=\s(\{.*\})<\/script>", source).group(1))
Optionally: Paste the output of dictionary
here and click "Beautify" to see data as readable JSON
Then you can access required value by key, e.g.
for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['home_players']:
print(player['name'])
Kyrie Irving
Jaylen Brown
Jayson Tatum
Gordon Hayward
Al Horford
for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['away_players']:
print(player['name'])
D.J. Augustin
Evan Fournier
Jonathan Isaac
Aaron Gordon
Nikola Vucevic
Update
I guess I just made it overcomplicated :)
It should be as simple as below:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for player in source['data'][0]['away_players']:
print(player['name'])
Update 2
To get all teams lineups use below:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for team in source['data']:
print("\n%s players\n" % team['home_route'].capitalize())
for player in team['home_players']:
print(player['name'])
print("\n%s players\n" % team['away_route'].capitalize())
for player in team['away_players']:
print(player['name'])
Upvotes: 4