Reputation: 28640
I've successfully been able to use beautiful soup in the past (I'm still learning how to use it), but I'm getting stuck on how to get this one specific table here:
https://fantasydata.com/nfl-stats/point-spreads-and-odds?season=2017&seasontype=1&week=1
In the past, it's as simple as doing:
url = 'https://fantasydata.com/nfl-stats/point-spreads-and-odds? season=2017&seasontype=1&week=1'
html = requests.get(url)
soup = BeautifulSoup(html.text, "html.parser")
or
driver = webdriver.Chrome()
page_url = 'https://fantasydata.com/nfl-stats/point-spreads-and-odds?season=2017&seasontype=1&week=1' %(year,nfl_week)
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
but none of the data in the table is there to be parse from the page source.
Is there a better way to handle this with beautiful soup?
EDIT:
Ok, so I went back and did:
driver = webdriver.Chrome()
page_url = 'https://fantasydata.com/nfl-stats/point-spreads-and-odds?season=2017&seasontype=1&week=1' %(year,nfl_week)
driver.get(page_url)
soup = BeautifulSoup(driver.page_source, "lxml")
again. THIS time, the data was showing up. I assumed that since it was being loaded in, that using Selenium was the correct way to go, but got thrown off when it didn't work.
Any ideas on why that didn't work the first time? I didn't close the browser or anything before the page loaded.
Upvotes: 0
Views: 279
Reputation: 1267
Okay, so I did some snooping around, and someone beat me to it!!! I did the same thing as the other answer but in Firefox dev tools (the combination ctrl+shift+k), using the xhr
section of the Network tab just like the other answer. It looks like the API call that populates the table on your site is a POST
request to https://fantasydata.com/NFLTeamStats/Odds_Read
. Here is a js
object with all of the available parameters:
{
filter:,
filters.endweek:,
filters.exportType:,
filters.leaguetype:,
filters.minimumsnaps:,
filters.playerid:,
filters.position:,
filters.scope:,
filters.scoringsystem:,
filters.searchtext:,
filters.season: 2017,
filters.seasontype: 1,
filters.startweek:,
filters.stattype:,
filters.subscope:,
filters.team:,
filters.teamaspect:,
filters.week: 1,
group:,
page: 1,
pageSize: 50,
sort:
}
The body of the POST
will be a json
object like the one above. If they don't block cross-origin requests, you can just use the Python requests library directly. If they do block cross-origin requests you can try to mimic the headers and options they set, or, I forget how, but I know you can inject a javascript AJAX request in selenium from the page. Just as a side note, you must use webDriverWait
or some other async code to await the response if you want to automate async js
inection in Python.
Upvotes: 1
Reputation: 903
You don't need BeautifulSoup or Selenium for this. Data is available as python dictionary on POSTing
query to https://fantasydata.com/NFLTeamStats/Odds_Read
.
query = { # just mimicking sample query that I saw after loading your link
'page': 1,
'pageSize': 50,
'filters.season': 2017,
'filters.seasontype': 1,
'filters.week': 1,
}
response = requests.post('https://fantasydata.com/NFLTeamStats/Odds_Read', data=query)
data = response.json()
data
{'Data': [{'Date': 'September 7, 2017 8:30 PM', 'Favorite': 'at Patriots', 'PointSpread': '-9.0', 'UnderDog': 'Chiefs', 'OverUnder': '48.0', 'AwayTeamMoneyLine': '+400', 'HomeTeamMoneyLine': '-450'}, {'Date': 'September 10, 2017 1:00 PM', 'Favorite': 'Buccaneers', 'PointSpread': '-2.5', 'UnderDog': 'at Dolphins', 'OverUnder': '41.5', 'AwayTeamMoneyLine': '-140', 'HomeTeamMoneyLine': '+120'}, {'Date': 'September 10, 2017 1:00 PM', 'Favorite': 'at ...
You could find this method through studying network section of Chrome developer tools (push F12), especially XHR subsection:
Upvotes: 1