WhelanG
WhelanG

Reputation: 235

Scraping historical data from Yahoo Finance with Python

as some of you probably know by now, it seems that Yahoo! Finance has discontinued its API for stock market data. While I am aware of the existence of the fix-yahoo-finance solution, I was trying to implement a more stable solution to my code by directly scraping historical data from Yahoo.

So here is what I have for the moment:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://finance.yahoo.com/quote/AAPL/history?period1=345423600&period2=1495922400&interval=1d&filter=history&frequency=1d")
soup = BeautifulSoup(page.content, 'html.parser')
soup
print(soup.prettify())

To get the data from Yahoo table I can do:

c=soup.find_all('tbody')
print(c)

My question is, how do I turn "c" into a nicer dataframe? Thanks!

Upvotes: 4

Views: 11064

Answers (2)

akshita
akshita

Reputation: 11

Alternatively, use the python library yfinance and yahoofinancials.

import pandas as pd
import yfinance as yf
from yahoofinancials import YahooFinancials

Choose the company stock ticker, eg. 'AAPL'.

df = yf.download('AAPL')

This will store historical stock data to the dataframe df.

Upvotes: 1

Mike D
Mike D

Reputation: 375

I wrote this to get historical data from YF directly from the download csv link. It needs to make two requests, one to get the cookie and the crumb and another one to get the data. It returns a pandas dataframe

import re
from io import StringIO
from datetime import datetime, timedelta

import requests
import pandas as pd


class YahooFinanceHistory:
    timeout = 2
    crumb_link = 'https://finance.yahoo.com/quote/{0}/history?p={0}'
    crumble_regex = r'CrumbStore":{"crumb":"(.*?)"}'
    quote_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={dfrom}&period2={dto}&interval=1d&events=history&crumb={crumb}'

    def __init__(self, symbol, days_back=7):
        self.symbol = symbol
        self.session = requests.Session()
        self.dt = timedelta(days=days_back)

    def get_crumb(self):
        response = self.session.get(self.crumb_link.format(self.symbol), timeout=self.timeout)
        response.raise_for_status()
        match = re.search(self.crumble_regex, response.text)
        if not match:
            raise ValueError('Could not get crumb from Yahoo Finance')
        else:
            self.crumb = match.group(1)

    def get_quote(self):
        if not hasattr(self, 'crumb') or len(self.session.cookies) == 0:
            self.get_crumb()
        now = datetime.utcnow()
        dateto = int(now.timestamp())
        datefrom = int((now - self.dt).timestamp())
        url = self.quote_link.format(quote=self.symbol, dfrom=datefrom, dto=dateto, crumb=self.crumb)
        response = self.session.get(url)
        response.raise_for_status()
        return pd.read_csv(StringIO(response.text), parse_dates=['Date'])

You can use it like this:

df = YahooFinanceHistory('AAPL', days_back=30).get_quote()

Upvotes: 17

Related Questions