Reputation: 123
I am trying to scrape a site that has a drop down menu where the user can select the year for the data to display. However, I seem to be stuck in my implementation of this.
Here is the website URL: https://www.pgatour.com/tournaments/masters-tournament/past-results.html
This is for a personal project to scrape golf data for each major tournament for each year. I know how to pull the desired stats once the year has been selected.
Here is an example of the websites html for the drop down menu
<select name="year" id="pastResultsYearSelector" class="hasCustomSelect"
style="-webkit-appearance: menulist-button; width: 180px; position: absolute;
opacity: 0; height: 42px; font-size: 18px;">
<option value="2019" selected="selected">2019</option>
<option value="2018">2018</option>
<option value="2017">2017</option>
<option value="2016">2016</option>
Here is what I've tried so far:
headers = {
'user-agent':
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'
}
data = {
'name':'2019', 'id':'pastResultYearSelector', 'class':'hasCustomSelect',
'style':'-webkit-appearance: menulist-button; width: 180px; position: absolute; opacity: 0; height: 42px; font-size: 18px;'
}
url = "https://www.pgatour.com/tournaments/masters-tournament/past-results.html"
r = requests.post(url, data=data, headers=headers, timeout=20)
soup = BeautifulSoup(r.text, 'html.parser')
However my request seems to be invalid, as I get a response saying the requested page was not found.
Upvotes: 3
Views: 3984
Reputation: 84475
As mentioned in comments you can use the following url construct which the page does to update content by year
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.pgatour.com/content/pgatour/tournaments/masters-tournament/past-results/jcr:content/mainParsys/pastresults.selectedYear.{}.html'.format(2017))
soup = bs(r.content, 'lxml')
You will want to do some dataframe tifying but you could use pandas to grab the handle the table
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
r = requests.get('https://www.pgatour.com/content/pgatour/tournaments/masters-tournament/past-results/jcr:content/mainParsys/pastresults.selectedYear.{}.html'.format(2017))
soup = bs(r.content, 'lxml')
table = pd.read_html(str(soup.select_one('table')))[0]
Upvotes: 1