Reputation: 235
as some of you probably know by now, it seems that Yahoo! Finance has discontinued its API for stock market data. While I am aware of the existence of the fix-yahoo-finance
solution, I was trying to implement a more stable solution to my code by directly scraping historical data from Yahoo.
So here is what I have for the moment:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://finance.yahoo.com/quote/AAPL/history?period1=345423600&period2=1495922400&interval=1d&filter=history&frequency=1d")
soup = BeautifulSoup(page.content, 'html.parser')
soup
print(soup.prettify())
To get the data from Yahoo table I can do:
c=soup.find_all('tbody')
print(c)
My question is, how do I turn "c" into a nicer dataframe? Thanks!
Upvotes: 4
Views: 11064
Reputation: 11
Alternatively, use the python library yfinance and yahoofinancials.
import pandas as pd
import yfinance as yf
from yahoofinancials import YahooFinancials
Choose the company stock ticker, eg. 'AAPL'.
df = yf.download('AAPL')
This will store historical stock data to the dataframe df.
Upvotes: 1
Reputation: 375
I wrote this to get historical data from YF directly from the download csv link. It needs to make two requests, one to get the cookie and the crumb and another one to get the data. It returns a pandas dataframe
import re
from io import StringIO
from datetime import datetime, timedelta
import requests
import pandas as pd
class YahooFinanceHistory:
timeout = 2
crumb_link = 'https://finance.yahoo.com/quote/{0}/history?p={0}'
crumble_regex = r'CrumbStore":{"crumb":"(.*?)"}'
quote_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={dfrom}&period2={dto}&interval=1d&events=history&crumb={crumb}'
def __init__(self, symbol, days_back=7):
self.symbol = symbol
self.session = requests.Session()
self.dt = timedelta(days=days_back)
def get_crumb(self):
response = self.session.get(self.crumb_link.format(self.symbol), timeout=self.timeout)
response.raise_for_status()
match = re.search(self.crumble_regex, response.text)
if not match:
raise ValueError('Could not get crumb from Yahoo Finance')
else:
self.crumb = match.group(1)
def get_quote(self):
if not hasattr(self, 'crumb') or len(self.session.cookies) == 0:
self.get_crumb()
now = datetime.utcnow()
dateto = int(now.timestamp())
datefrom = int((now - self.dt).timestamp())
url = self.quote_link.format(quote=self.symbol, dfrom=datefrom, dto=dateto, crumb=self.crumb)
response = self.session.get(url)
response.raise_for_status()
return pd.read_csv(StringIO(response.text), parse_dates=['Date'])
You can use it like this:
df = YahooFinanceHistory('AAPL', days_back=30).get_quote()
Upvotes: 17