Reputation: 4668
I am looking to scrape some data from an espn site using python.
http://www.espn.co.uk/rugby/playerstats?gameId=293905&league=289234
I am using beautiful soup to extract the content from the page.
item = soup.findAll('span', attrs={'data-reactid': '136'})[0].text
Will only show me the column heading. Inside that link are data-reactid links that are not shown in the urls. How does one navigate data-react links ? The url stays the same when you click on defending or attacking link ?
Upvotes: 0
Views: 301
Reputation: 2263
The beautifulsoup
path looks difficult. I think this may work for you:
import requests
import json
import re
url = "http://www.espn.co.uk/rugby/playerstats?gameId=293905&league=289234"
html_doc = requests.get(url)
# not the best regex but it works. there's a lot of data.
stats = json.loads(re.search(r"window.__INITIAL_STATE__\s*=\s*({.*});",html_doc.text).group(1))
# show what we have
stats['gamePackage']['matchLineUp'].keys()
# Out[42]: dict_keys(['text', 'home', 'away', 'gameState', 'sport', 'show'])
# no idea what this sport is. a typo?
stats['gamePackage']['matchLineUp']['sport']
# Out[43]: 'rugby'
stats['gamePackage']['matchLineUp']['home']
# {'name': 'ITALY',
# 'logo': 'http://a1.espncdn.com/combiner/i?img=/i/teamlogos/rugby/teams/500/20.png&h=35&w=35',
# 'team': [
# {'id': '91554',
# 'url': 'http://en.espn.co.uk/sport/rugby/player/91554.html',
# 'name': 'Jayden Hayward',
# 'number': '15',
# 'position': 'FB',
# 'captain': False,
# 'subbed': False,
# 'homeAway': 'home',
# ...
And you can iterate, or whatever:
for home_player in stats['gamePackage']['matchLineUp']['home']['team']:
print("{} - {}".format(home_player['name'], home_player['number']))
Jayden Hayward - 15
Tommaso Benvenuti - 14
Michele Campagnaro - 13
Tommaso Castello - 12
Luca Sperandio - 11
Tommaso Allan - 10
Tito Tebaldi - 9
Andrea Lovotti - 1
Leonardo Ghiraldini - 2
Simone Ferrari - 3
Alessandro Zanni - 4
Dean Budd - 5
Sebastian Negri - 6
Jake Polledri - 7
Braam Steyn - 8
There's tons of other info in there but figured this would get you going...
Upvotes: 1