Need help scraping an NHL statistics table with lxml and xpath

Question

I am new to python(using python3.6), I am learning it mainly to be able to build a scraper for this page http://www.nhl.com/stats/player?aggregate=0&gameType=2&report=skatersummary&pos=S&reportType=season&seasonFrom=20162017&seasonTo=20162017&filter=gamesPlayed,gte,1&sort=points,goals,assists

I have tried many things, I originally wanted to try with xpath but after failing, I decide to try with BeautifulSoup4 and I am getting this error

    for row in soup('table', {'class': 'stat-table'})[0].tbody('tr'):
IndexError: list index out of range

from this code

import urllib.request
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib.request.urlopen('http://www.nhl.com/stats/player?aggregate=0&gameType=2&report=skatersummary&pos=S&reportType=season&seasonFrom=20162017&seasonTo=20162017&filter=gamesPlayed,gte,1&sort=points,goals,assists'),"lxml")

for row in soup('table', {'class': 'stat-table'})[0].tbody('tr'):
    tds = row('td')
    print(tds[0].string, tds[1].string)

nguaman · Accepted Answer

To make this works, you have to find the correct url who make the requests to the internal API.

To get the url you have to use the web console of google chrome.

1) open the console and make click in "Network"

2) then refresh the website and you will see all the requests from this page.

3) then you have to filter by "XHR" , and there you go!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import lxml.html
from pprint import pprint 
from sys import exit
import json
import csv

url = 'http://www.nhl.com/stats/rest/grouped/skaters/basic/season/skatersummary?cayenneExp=seasonId=20162017 and gameTypeId=2&factCayenneExp=gamesPlayed>=1&sort=[{"property":"points","direction":"DESC"},{"property":"goals","direction":"DESC"},{"property":"assists","direction":"DESC"}]'
resp = requests.get(url).text
resp = json.loads(resp)

pprint(resp['data'])

Need help scraping an NHL statistics table with lxml and xpath

Answers (1)

Related Questions