Reputation: 338
I want to get the max historical price data with scrapy from yahoo finance.
Here is url of fb(facebook) max historical price data.
https://query1.finance.yahoo.com/v7/finance/download/FNMA?period1=221115600&period2=1508472000&interval=1d&events=history&crumb=1qRuQKELxmM
In order to write a stock price web crawler ,two problems i can't solve.
1.How to get the argument period1 ?
You can get it by hand in the web page,just to click max.
How to get the argument with python codes?
Different stock has the different period1 value.
2.How to create the argument crumb=1qRuQKELxmM automatically ,different stocks with different crumb value?
Here is my stock max historical data with scrapy framework.
import scrapy
class TestSpider(scrapy.Spider):
name = "quotes"
allowed_domains = ["finance.yahoo.com"]
def __init__(self, *args, **kw):
self.timeout = 10
def start_requests(self):
stockName = get-it and ommit the codes
for stock in stockName:
period1 = how to fill it
crumb = how to fill it
per_stock_max_data = "https://query1.finance.yahoo.com/v7/finance\
download/"+stock+"?period1="+period1+"&period2=1508472000&\
interval=1d&events=history&"+"crumb="crumb
yield scrapy.Request(per_stock_max_data,callback=self.parse)
def parse(self, response):
content = response.body
target = response.url
#do something
How to fill the blank above in my web scrawler framework?
Upvotes: 5
Views: 10490
Reputation: 4480
Came across this thread and wanted to add another option: the Python yfinance package. The Ticker API has a history method via which the time period can be specified as 'max'
such that the entirety of the available asset data is returned. Example:
import yfinance as yf
spy = yf.Ticker('SPY').history(
period='max',
interval='1d'
)
The yfinance
package utilizes the pandas
package and returns spy
as a DataFrame
object as such:
Open High ... Stock Splits Capital Gains
Date ...
1993-01-29 00:00:00-05:00 25.236177 25.236177 ... 0.0 0.0
1993-02-01 00:00:00-05:00 25.236163 25.397589 ... 0.0 0.0
1993-02-02 00:00:00-05:00 25.379641 25.469322 ... 0.0 0.0
1993-02-03 00:00:00-05:00 25.487262 25.738368 ... 0.0 0.0
1993-02-04 00:00:00-05:00 25.810116 25.881861 ... 0.0 0.0
... ... ... ... ... ...
2023-01-24 00:00:00-05:00 398.880005 401.149994 ... 0.0 0.0
2023-01-25 00:00:00-05:00 395.950012 400.700012 ... 0.0 0.0
2023-01-26 00:00:00-05:00 403.130005 404.920013 ... 0.0 0.0
2023-01-27 00:00:00-05:00 403.660004 408.160004 ... 0.0 0.0
2023-01-30 00:00:00-05:00 402.799988 405.119995 ... 0.0 0.0
[7555 rows x 8 columns]
At the time of writing (1/30/2023) this represents the full, daily, dataset for SPY available via Yahoo Finance. Alternatively, a value of None
for the period
argument will result in the same result. Worth noting, omitting a value for the period
will use the default "1mo"
value.
Note: looking at the source code for Yfinance reveals that a period of "max" or None causes the condition whereby a value of -2208994789
(1900 in Unix time) is used for the start date.
Upvotes: 1
Reputation: 3854
As I understand you want to download all possible data for a specific ticker. So to do this you actually don't need to provide period1
parameter, if you provide 0 in the place of period1
then Yahoo API puts as default the oldest date.
To download quotes using the way you showed in the question we unfortunately have to deal with cookies. I will let myself provide solution without using Scrapy, only ticker itself is required:
def get_yahoo_ticker_data(ticker):
res = requests.get('https://finance.yahoo.com/quote/' + ticker + '/history')
yahoo_cookie = res.cookies['B']
yahoo_crumb = None
pattern = re.compile('.*"CrumbStore":\{"crumb":"(?P<crumb>[^"]+)"\}')
for line in res.text.splitlines():
m = pattern.match(line)
if m is not None:
yahoo_crumb = m.groupdict()['crumb']
cookie_tuple = yahoo_cookie, yahoo_crumb
current_date = int(time.time())
url_kwargs = {'symbol': ticker, 'timestamp_end': current_date,
'crumb': cookie_tuple[1]}
url_price = 'https://query1.finance.yahoo.com/v7/finance/download/' \
'{symbol}?period1=0&period2={timestamp_end}&interval=1d&events=history' \
'&crumb={crumb}'.format(**url_kwargs)
response = requests.get(url_price, cookies={'B': cookie_tuple[0]})
return pd.read_csv(StringIO(response.text), parse_dates=['Date'])
If you really need the oldest date then you can use the code above and extract the first date from the response.
get_yahoo_ticker_data(ticker='AAPL')
I do know that web scraping is not an efficient option but it's the only option we have because Yahoo already decommissioned all APIs. You might find some third party solution but all of them use scraping inside their source code and they add some additional boiler plate code that decreases overall performance.
Upvotes: 9
Reputation: 3
If you just put a 0 in place of period 1 it should work because the interval will be the beginning of time to the time you just clicked.
The second period you can just put as a really large int such as 1900000000 and it will just take all that occurs until that date.
Upvotes: 0
Reputation: 2212
Both period1 and period2 are "seconds since epoch" values and you can convert between python timestamps and those values using datetime.datetime.fromtimestamp(dt)
and int(dt.timestamp())
. But as others already mentioned, you don't need to specify exact numbers for these parameters, you can use zero for period1 and 2000000000 for period2 for all the stocks.
Important that the same "crumb" value is valid for downloading all stocks for some time (around 1 week). So instead of getting a new "crumb" before every download request, you should cache it and update only if you get "Unauthorized" response; your downloads will execute twice faster. The easiest way to get the crumb value is to request Yahoo main page (https://finance.yahoo.com/) and find "user":{"crumb":" substring there.
Upvotes: 0
Reputation: 1548
If what you want is the whole history you don't really need to compute the max date, use a reasonable old date (in the example below 1900/01/01). For example let's assume you're interested in FB
's stock, then this should work
import scrapy
import time
class FinanceSpider(scrapy.Spider):
name = "finance"
allowed_domains = ["finance.yahoo.com"]
start_urls = ['https://finance.yahoo.com/quote/FB']
def parse(self, response):
crumb = response.css('script').re_first('user":{"crumb":"(.*?)"').decode('unicode_escape')
url = ("https://query1.finance.yahoo.com/v7/finance/download/FB" +
"?period1=-2208988800&period2=" + str(int(time.time())) + "&interval=1d&events=history&" +
"crumb={}".format(crumb))
return scrapy.Request(url, callback=self.parse_csv)
def parse_csv(self, response):
lines = response.body.strip().split('\n')
print(lines[0])
print(lines[1])
print(lines[-1])
Upvotes: 0
Reputation: 5088
after installing pandas datareader with:
pip install pandas-datareader
You can request the stock prices with this code:
import pandas_datareader as pdr
from datetime import datetime
appl = pdr.get_data_yahoo(symbols='AAPL', start=datetime(2000, 1, 1), end=datetime(2012, 1, 1))
print(appl['Adj Close'])
Upvotes: 2