Reputation: 139
I am new to python(using python3.6), I am learning it mainly to be able to build a scraper for this page http://www.nhl.com/stats/player?aggregate=0&gameType=2&report=skatersummary&pos=S&reportType=season&seasonFrom=20162017&seasonTo=20162017&filter=gamesPlayed,gte,1&sort=points,goals,assists
I have tried many things, I originally wanted to try with xpath but after failing, I decide to try with BeautifulSoup4 and I am getting this error
for row in soup('table', {'class': 'stat-table'})[0].tbody('tr'):
IndexError: list index out of range
from this code
import urllib.request
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib.request.urlopen('http://www.nhl.com/stats/player?aggregate=0&gameType=2&report=skatersummary&pos=S&reportType=season&seasonFrom=20162017&seasonTo=20162017&filter=gamesPlayed,gte,1&sort=points,goals,assists'),"lxml")
for row in soup('table', {'class': 'stat-table'})[0].tbody('tr'):
tds = row('td')
print(tds[0].string, tds[1].string)
Upvotes: 1
Views: 1335
Reputation: 971
To make this works, you have to find the correct url who make the requests to the internal API.
To get the url you have to use the web console of google chrome.
1) open the console and make click in "Network"
2) then refresh the website and you will see all the requests from this page.
3) then you have to filter by "XHR" , and there you go!
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
import lxml.html
from pprint import pprint
from sys import exit
import json
import csv
url = 'http://www.nhl.com/stats/rest/grouped/skaters/basic/season/skatersummary?cayenneExp=seasonId=20162017 and gameTypeId=2&factCayenneExp=gamesPlayed>=1&sort=[{"property":"points","direction":"DESC"},{"property":"goals","direction":"DESC"},{"property":"assists","direction":"DESC"}]'
resp = requests.get(url).text
resp = json.loads(resp)
pprint(resp['data'])
Upvotes: 4