Reputation: 1
I am trying to scrape a list of dates from: https://ca.finance.yahoo.com/quote/AAPL/options
The dates are located within a drop down menu right above the option chain. I've scraped text from this website before but this text is using a 'select' & 'option' syntax. How would I adjust my code to gather this type of text? I have used many variations of the code below to try and scrape the text but am having no luck.
Thank you very much.
import bs4
import requests
datesLink = ('https://ca.finance.yahoo.com/quote/AAPL/options')
datesPage = requests.get(datesLink)
datesSoup = BeautifulSoup(datesPage.text, 'lxml')
datesQuote = datesSoup.find('div', {'class': 'Cf Pt(18px)controls'}).find('option').text
Upvotes: 0
Views: 205
Reputation: 1641
The reason you can't seem to extract this dropdown list is because this list is generated dynamically, and the easiest way to know this is by saving your html content into a file and giving it a manual look, in a text editor.
You CAN, however, parse those dates out of the script source code, which is in the same html file, using some ugly regex way. For example, this seems to work:
import requests, re
from datetime import *
content = requests.get('https://ca.finance.yahoo.com/quote/AAPL/options').content.decode()
match = re.search(r'"OptionContractsStore".*?"expirationDates".*?\[(.*?)\]', content)
dates = [datetime.fromtimestamp(int(x), tz=timezone.utc) for x in match.group(1).split(',')]
for d in dates:
print(d.strftime('%Y-%m-%d'))
It should be obvious that parsing stuff in such a nasty way isn't fool-proof, and likely going to break sooner rather than later. But the same can be said about any kind of web scraping entirely.
Upvotes: 1
Reputation: 15606
You can simply read HTML directly to Pandas:
import pandas as pd
URI = 'https://ca.finance.yahoo.com/quote/AAPL/options'
df = pd.read_html(URI)[0] #[1] depending on the table you wish for
Upvotes: 0