Reputation: 915
sorry if this is not the place for this question, but I'm not sure where else to ask.
I'm trying to scrape data from rotogrinders.com and I'm running into some challenges.
In particular, I want to be able to scrape previous NHL game data using urls of this format (obviously you can change the date for other day's data): https://rotogrinders.com/game-stats/nhl-skater?site=draftkings&date=11-22-2016
However, when I get to the page, I notice that the data is broken up into pages, and I'm unsure what to do to get my script to get the data that's presented after clicking the "all" button at the bottom of the page.
Is there a way to do this in python? Perhaps some library that will allow button clicks? Or is there some way to get the data without actually clicking the button by being clever about the URL/request?
Upvotes: 0
Views: 561
Reputation: 473813
Actually, things are not that complicated in this case. When you click "All" no network requests are issued. All the data is already there - inside a script
tag in the HTML, you just need to extract it.
Working code using requests
(to download the page content), BeautifulSoup
(to parse HTML and locate the desired script
element), re
(to extract the desired "player" array from the script) and json
(to load the array string into a Python list):
import json
import re
import requests
from bs4 import BeautifulSoup
url = "https://rotogrinders.com/game-stats/nhl-skater?site=draftkings&date=11-22-2016"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
pattern = re.compile(r"var data = (\[.*?\]);$", re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
data = pattern.search(script.text).group(1)
data = json.loads(data)
# printing player names for demonstration purposes
for player in data:
print(player["player"])
Prints:
Jeff Skinner
Jordan Staal
...
William Carrier
A.J. Greer
Upvotes: 1