Reputation: 519
I found this great working code snippet from: Web scraping of Yahoo Finance statistics using BS4
Here is the code I am referring to:
import requests, re, json, pprint
p = re.compile(r'root\.App\.main = (.*);')
tickers = ['NKE','AAPL','SPG']
results = {}
with requests.Session() as s:
for ticker in tickers:
r = s.get('https://finance.yahoo.com/quote/{}/key-statistics?p={}'.format(ticker,ticker))
data = json.loads(p.findall(r.text)[0])
key_stats = data['context']['dispatcher']['stores']['QuoteSummaryStore']
res = {
'Enterprise Value' : key_stats['defaultKeyStatistics']['enterpriseValue']['fmt']
,'Trailing P/E' : key_stats['summaryDetail']['trailingPE']['fmt']
,'Forward P/E' : key_stats['summaryDetail']['forwardPE']['fmt']
,'PEG Ratio (5 yr expected)' : key_stats['defaultKeyStatistics']['pegRatio']['fmt']
, 'Return on Assets' : key_stats['financialData']['returnOnAssets']['fmt']
, 'Quarterly Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']
, 'EBITDA' : key_stats['financialData']['ebitda']['fmt']
, 'Diluted EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
, 'Total Debt/Equity' : key_stats['financialData']['debtToEquity']['fmt']
, 'Current Ratio' : key_stats['financialData']['currentRatio']['fmt']
}
results[ticker] = res
pprint.pprint(results)
I have tested this code and it works. However, I am new to JSON and while I understand the high-level of how this code works, I am unsure of the mechanics of several sections of this code.
I would be very grateful for some commentary/explanation of how these sections work:
Section 1: Question 1: How does this regex work with respect to the web page? I haven't seen a regex that looks like this before.
p = re.compile(r'root\.App\.main = (.*);')
Section 2: Question 2: I didn't realise that the key statistics on the page was broken up into context, dispatcher, stores and QuoteSummaryStore. How does this code block work and where can a newbie look to find more info about it?
key_stats = data['context']['dispatcher']['stores']['QuoteSummaryStore']
Section 3: Question 3: How does someone figure out that Enterprise Value is comprised of key_stats['defaultKeyStatistics']['enterpriseValue']['fmt'] ?
res = {
'Enterprise Value' : key_stats['defaultKeyStatistics']['enterpriseValue']['fmt']
,'Trailing P/E' : key_stats['summaryDetail']['trailingPE']['fmt']
,'Forward P/E' : key_stats['summaryDetail']['forwardPE']['fmt']
,'PEG Ratio (5 yr expected)' : key_stats['defaultKeyStatistics']['pegRatio']['fmt']
, 'Return on Assets' : key_stats['financialData']['returnOnAssets']['fmt']
, 'Quarterly Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']
, 'EBITDA' : key_stats['financialData']['ebitda']['fmt']
, 'Diluted EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
, 'Total Debt/Equity' : key_stats['financialData']['debtToEquity']['fmt']
, 'Current Ratio' : key_stats['financialData']['currentRatio']['fmt']
Thanks in advance.
Upvotes: 1
Views: 1100
Reputation: 519
After examining the output of key_stats in this code, I have a better understanding of how the code spits out the data. I've collated the data that is of interest to me. Hope this helps someone else out in the future.
This basically answers questions 2 and 3.
I am still confused about Question 1 though.
Here is the majority of the useful data in the output:
res = {
'Enterprise Value' : key_stats['defaultKeyStatistics']['enterpriseValue']['fmt']
,'Enterprise Value over Revenue' : key_stats['defaultKeyStatistics']['enterpriseToRevenue']['fmt']
,'Profit Margin' : key_stats['defaultKeyStatistics']['profitMargins']['fmt']
,'Enterprise Value over EBITDA' : key_stats['defaultKeyStatistics']['enterpriseToEbitda']['fmt']
,'Forward EPS' : key_stats['defaultKeyStatistics']['forwardEps']['fmt']
,'Trailing EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
,'Shares Outstanding' : key_stats['defaultKeyStatistics']['sharesOutstanding']['fmt']
,'Book Value' : key_stats['defaultKeyStatistics']['bookValue']['fmt']
,'Shares Short' : key_stats['defaultKeyStatistics']['sharesShort']['fmt']
,'Shares Short Pct Out' : key_stats['defaultKeyStatistics']['sharesPercentSharesOut']['fmt']
,'Held Pct by Institutions' : key_stats['defaultKeyStatistics']['heldPercentInstitutions']['fmt']
,'Held Pct by Insiders' : key_stats['defaultKeyStatistics']['heldPercentInsiders']['fmt']
,'Net Income to Common Stock' : key_stats['defaultKeyStatistics']['netIncomeToCommon']['fmt']
,'Short Ratio' : key_stats['defaultKeyStatistics']['shortRatio']['fmt']
,'Float' : key_stats['defaultKeyStatistics']['floatShares']['fmt']
,'Price to Sales Trl 12 Mths' : key_stats['defaultKeyStatistics']['priceToSalesTrailing12Months']['fmt']
,'PEG Ratio (5 yr expected)' : key_stats['defaultKeyStatistics']['pegRatio']['fmt']
,'YTD Return' : key_stats['defaultKeyStatistics']['ytdReturn']['fmt']
, 'Diluted EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
,'Trailing P/E' : key_stats['summaryDetail']['trailingPE']['fmt']
,'Forward P/E' : key_stats['summaryDetail']['forwardPE']['fmt']
,'Open' : key_stats['summaryDetail']['regularMarketOpen']['fmt']
,'High' : key_stats['summaryDetail']['regularMarketDayHigh']['fmt']
,'Low' : key_stats['summaryDetail']['regularMarketDayLow']['fmt']
,'Close' : key_stats['summaryDetail']['regularMarketPrice']['fmt']
,'Previous Close' : key_stats['summaryDetail']['regularMarketPreviousClose']['fmt']
,'Avg 10 Day Volume' : key_stats['summaryDetail']['averageDailyVolume10Day']['fmt']
,'Avg 3 Mth Volume' : key_stats['summaryDetail']['averageDailyVolume3Month']['fmt']
,'Volume' : key_stats['summaryDetail']['regularMarketVolume']['fmt']
,'Market Capitalisation' : key_stats['summaryDetail']['marketCap']['longFmt']
,'Dividend Rate' : key_stats['summaryDetail']['dividendRate']['longFmt']
,'Trailing Ann Div Yld' : key_stats['summaryDetail']['trailingAnnualDividendYield']['longFmt']
,'Trailing Ann Div Rate' : key_stats['summaryDetail']['trailingAnnualDividendRate']['longFmt']
,'Payout Ratio' : key_stats['summaryDetail']['payoutRatio']['longFmt']
,'Total Assets' : key_stats['summaryDetail']['TotalAssets']['longFmt']
,'Price To Sales Trl 12 Mths' : key_stats['summaryDetail']['priceToSalesTrailing12Months']['longFmt']
,'Five Yr Avg Div Yld' : key_stats['summaryDetail']['fiveYearAvgDividendYield']['longFmt']
,'Dividend Yield' : key_stats['summaryDetail']['dividendYield']['longFmt']
, 'EBITDA Margins' : key_stats['financialData']['ebitdaMargins']['fmt']
, 'Profit Margins' : key_stats['financialData']['profitMargins']['fmt']
, 'Gross Margins' : key_stats['financialData']['grossMargins']['fmt']
, 'Operating Cash Flow' : key_stats['financialData']['operatingCashflow']['fmt']
, 'Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']
, 'Operating Margins' : key_stats['financialData']['operatingMargins']['fmt']
, 'EBITDA' : key_stats['financialData']['ebitda']['fmt']
, 'Target Low Price' : key_stats['financialData']['targetLowPrice']['fmt']
, 'Gross Profits' : key_stats['financialData']['grossProfits']['fmt']
, 'Free Cash Flow' : key_stats['financialData']['freeCashflow']['fmt']
, 'Target Median Price' : key_stats['financialData']['targetMedianPrice']['fmt']
, 'Earnings Growth' : key_stats['financialData']['earningsGrowth']['fmt']
, 'Current Ratio' : key_stats['financialData']['currentRatio']['fmt']
, 'Return on Assets' : key_stats['financialData']['returnOnAssets']['fmt']
, 'Target Mean Price' : key_stats['financialData']['targetMeanPrice']['fmt']
, 'Total Debt/Equity' : key_stats['financialData']['debtToEquity']['fmt']
, 'Return On Equity' : key_stats['financialData']['returnOnEquity']['fmt']
, 'Target High Price' : key_stats['financialData']['targetHighPrice']['fmt']
, 'Total Cash' : key_stats['financialData']['totalCash']['fmt']
, 'Total Debt' : key_stats['financialData']['totalDebt']['fmt']
, 'Total Revenue' : key_stats['financialData']['totalRevenue']['fmt']
, 'Total Cash Per Share' : key_stats['financialData']['totalCashPerShare']['fmt']
, 'Revenue Per Share' : key_stats['financialData']['revenuePerShare']['fmt']
, 'Quick Ratio' : key_stats['financialData']['quickRatio']['fmt']
, 'Quarterly Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']
Upvotes: 1