Reputation: 123

Beautifulsoup web-scraping website with drop down menu

I am trying to scrape a site that has a drop down menu where the user can select the year for the data to display. However, I seem to be stuck in my implementation of this.

Here is the website URL: https://www.pgatour.com/tournaments/masters-tournament/past-results.html

This is for a personal project to scrape golf data for each major tournament for each year. I know how to pull the desired stats once the year has been selected.

Here is an example of the websites html for the drop down menu

<select name="year" id="pastResultsYearSelector" class="hasCustomSelect"
style="-webkit-appearance: menulist-button; width: 180px; position: absolute;
opacity: 0; height: 42px; font-size: 18px;">
            <option value="2019" selected="selected">2019</option>
            <option value="2018">2018</option>
            <option value="2017">2017</option>
            <option value="2016">2016</option>

Here is what I've tried so far:

headers = {
    'user-agent': 
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'
    }

data = {
    'name':'2019', 'id':'pastResultYearSelector', 'class':'hasCustomSelect',
    'style':'-webkit-appearance: menulist-button; width: 180px; position: absolute; opacity: 0; height: 42px; font-size: 18px;'
    }

url = "https://www.pgatour.com/tournaments/masters-tournament/past-results.html"

r = requests.post(url, data=data, headers=headers, timeout=20)

soup = BeautifulSoup(r.text, 'html.parser')

However my request seems to be invalid, as I get a response saying the requested page was not found.

Upvotes: 3

Answers (1)

QHarr

Reputation: 84475

As mentioned in comments you can use the following url construct which the page does to update content by year

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.pgatour.com/content/pgatour/tournaments/masters-tournament/past-results/jcr:content/mainParsys/pastresults.selectedYear.{}.html'.format(2017))

soup = bs(r.content, 'lxml')

You will want to do some dataframe tifying but you could use pandas to grab the handle the table

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

r = requests.get('https://www.pgatour.com/content/pgatour/tournaments/masters-tournament/past-results/jcr:content/mainParsys/pastresults.selectedYear.{}.html'.format(2017))
soup = bs(r.content, 'lxml')
table = pd.read_html(str(soup.select_one('table')))[0]

Upvotes: 1

Beautifulsoup web-scraping website with drop down menu

Answers (1)

Related Questions