showkey
showkey

Reputation: 338

How to get the max historical price data from yahoo finance?

I want to get the max historical price data with scrapy from yahoo finance.
Here is url of fb(facebook) max historical price data.

https://query1.finance.yahoo.com/v7/finance/download/FNMA?period1=221115600&period2=1508472000&interval=1d&events=history&crumb=1qRuQKELxmM

In order to write a stock price web crawler ,two problems i can't solve.
1.How to get the argument period1 ?
You can get it by hand in the web page,just to click max.
How to get the argument with python codes?
Different stock has the different period1 value.

enter image description here

2.How to create the argument crumb=1qRuQKELxmM automatically ,different stocks with different crumb value?
Here is my stock max historical data with scrapy framework.

import scrapy

class TestSpider(scrapy.Spider):
    name = "quotes"
    allowed_domains = ["finance.yahoo.com"]

    def __init__(self, *args, **kw):
        self.timeout = 10

    def start_requests(self):
        stockName =  get-it and ommit the codes 
        for stock in stockName:
            period1 =  how to fill it
            crumb = how to fill it
            per_stock_max_data = "https://query1.finance.yahoo.com/v7/finance\
                  download/"+stock+"?period1="+period1+"&period2=1508472000&\
                  interval=1d&events=history&"+"crumb="crumb
            yield scrapy.Request(per_stock_max_data,callback=self.parse)

    def parse(self, response):
        content = response.body
        target = response.url
        #do something

How to fill the blank above in my web scrawler framework?

Upvotes: 5

Views: 10490

Answers (6)

alphazwest
alphazwest

Reputation: 4480

Came across this thread and wanted to add another option: the Python yfinance package. The Ticker API has a history method via which the time period can be specified as 'max' such that the entirety of the available asset data is returned. Example:

import yfinance as yf

spy = yf.Ticker('SPY').history(
    period='max',
    interval='1d'
)

The yfinance package utilizes the pandas package and returns spy as a DataFrame object as such:

                                 Open        High  ...  Stock Splits  Capital Gains
Date                                               ...                             
1993-01-29 00:00:00-05:00   25.236177   25.236177  ...           0.0            0.0
1993-02-01 00:00:00-05:00   25.236163   25.397589  ...           0.0            0.0
1993-02-02 00:00:00-05:00   25.379641   25.469322  ...           0.0            0.0
1993-02-03 00:00:00-05:00   25.487262   25.738368  ...           0.0            0.0
1993-02-04 00:00:00-05:00   25.810116   25.881861  ...           0.0            0.0
...                               ...         ...  ...           ...            ...
2023-01-24 00:00:00-05:00  398.880005  401.149994  ...           0.0            0.0
2023-01-25 00:00:00-05:00  395.950012  400.700012  ...           0.0            0.0
2023-01-26 00:00:00-05:00  403.130005  404.920013  ...           0.0            0.0
2023-01-27 00:00:00-05:00  403.660004  408.160004  ...           0.0            0.0
2023-01-30 00:00:00-05:00  402.799988  405.119995  ...           0.0            0.0

[7555 rows x 8 columns]

At the time of writing (1/30/2023) this represents the full, daily, dataset for SPY available via Yahoo Finance. Alternatively, a value of None for the period argument will result in the same result. Worth noting, omitting a value for the period will use the default "1mo" value.

Note: looking at the source code for Yfinance reveals that a period of "max" or None causes the condition whereby a value of -2208994789 (1900 in Unix time) is used for the start date.

Upvotes: 1

Michael Dz
Michael Dz

Reputation: 3854

As I understand you want to download all possible data for a specific ticker. So to do this you actually don't need to provide period1 parameter, if you provide 0 in the place of period1 then Yahoo API puts as default the oldest date.

To download quotes using the way you showed in the question we unfortunately have to deal with cookies. I will let myself provide solution without using Scrapy, only ticker itself is required:

def get_yahoo_ticker_data(ticker):
    res = requests.get('https://finance.yahoo.com/quote/' + ticker + '/history')
    yahoo_cookie = res.cookies['B']
    yahoo_crumb = None
    pattern = re.compile('.*"CrumbStore":\{"crumb":"(?P<crumb>[^"]+)"\}')
    for line in res.text.splitlines():
        m = pattern.match(line)
        if m is not None:
            yahoo_crumb = m.groupdict()['crumb']
    cookie_tuple = yahoo_cookie, yahoo_crumb

    current_date = int(time.time())
    url_kwargs = {'symbol': ticker, 'timestamp_end': current_date,
        'crumb': cookie_tuple[1]}
    url_price = 'https://query1.finance.yahoo.com/v7/finance/download/' \
                '{symbol}?period1=0&period2={timestamp_end}&interval=1d&events=history' \
                '&crumb={crumb}'.format(**url_kwargs)


    response = requests.get(url_price, cookies={'B': cookie_tuple[0]})

    return pd.read_csv(StringIO(response.text), parse_dates=['Date'])

If you really need the oldest date then you can use the code above and extract the first date from the response.

get_yahoo_ticker_data(ticker='AAPL')

I do know that web scraping is not an efficient option but it's the only option we have because Yahoo already decommissioned all APIs. You might find some third party solution but all of them use scraping inside their source code and they add some additional boiler plate code that decreases overall performance.

Upvotes: 9

kupinah
kupinah

Reputation: 3

If you just put a 0 in place of period 1 it should work because the interval will be the beginning of time to the time you just clicked.

The second period you can just put as a really large int such as 1900000000 and it will just take all that occurs until that date.

Upvotes: 0

Aleksandr Borisov
Aleksandr Borisov

Reputation: 2212

  1. Both period1 and period2 are "seconds since epoch" values and you can convert between python timestamps and those values using datetime.datetime.fromtimestamp(dt) and int(dt.timestamp()). But as others already mentioned, you don't need to specify exact numbers for these parameters, you can use zero for period1 and 2000000000 for period2 for all the stocks.

  2. Important that the same "crumb" value is valid for downloading all stocks for some time (around 1 week). So instead of getting a new "crumb" before every download request, you should cache it and update only if you get "Unauthorized" response; your downloads will execute twice faster. The easiest way to get the crumb value is to request Yahoo main page (https://finance.yahoo.com/) and find "user":{"crumb":" substring there.

Upvotes: 0

Wilfredo
Wilfredo

Reputation: 1548

If what you want is the whole history you don't really need to compute the max date, use a reasonable old date (in the example below 1900/01/01). For example let's assume you're interested in FB's stock, then this should work

import scrapy
import time


class FinanceSpider(scrapy.Spider):
    name = "finance"
    allowed_domains = ["finance.yahoo.com"]
    start_urls = ['https://finance.yahoo.com/quote/FB']

    def parse(self, response):
        crumb = response.css('script').re_first('user":{"crumb":"(.*?)"').decode('unicode_escape')
        url = ("https://query1.finance.yahoo.com/v7/finance/download/FB" +
               "?period1=-2208988800&period2=" + str(int(time.time())) + "&interval=1d&events=history&" +
               "crumb={}".format(crumb))
        return scrapy.Request(url, callback=self.parse_csv)

    def parse_csv(self, response):
        lines = response.body.strip().split('\n')
        print(lines[0])
        print(lines[1])
        print(lines[-1])

Upvotes: 0

mrCarnivore
mrCarnivore

Reputation: 5088

after installing pandas datareader with:

pip install pandas-datareader

You can request the stock prices with this code:

import pandas_datareader as pdr
from datetime import datetime

appl = pdr.get_data_yahoo(symbols='AAPL', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(appl['Adj Close'])

Upvotes: 2

Related Questions