TaL
TaL

Reputation: 183

Scraping in Yahoo Finance the Analysis tab with Python

I'm trying the extract the value of "Next 5 Years (per annum)" for the stock BABA from the Yahoo Finance "Analysis" tab : https://finance.yahoo.com/quote/BABA/analysis?p=BABA. (It's 2.85% the second row from the bottom).

I have been trying to use those questions:

Scrape Yahoo Finance Financial Ratios

Scrape Yahoo Finance Income Statement with Python

But I can't even extract the data from the page

tried this website as well :

https://hackernoon.com/scraping-yahoo-finance-data-using-python-ayu3zyl

This is the I code wrote the get the web page data

First import the packages:

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq

Then trying to extract the data from the page:

Url= "https://finance.yahoo.com/quote/BABA/analysis?p=BABA"
r = requests.get(Url)
data = r.text
soup = BeautifulSoup(data,features="lxml")

When looking at type of "data" and "soup" objects I see that

type(data)
<class 'str'>

I can extract somehow the needed data of the row of ">Next 5 Years" using regular expressions.

But when when looking at

type(soup)
<class 'bs4.BeautifulSoup'>

And the data in it is not relevant to the page for some reason.

looks like that (copied only small part of what in the soup object):

soup
<!DOCTYPE html>
<html class="NoJs featurephone" id="atomic" lang="en-US"><head prefix="og: 
http://ogp.me/ns#"><script>window.performance && window.performance.mark &&  
window.performance.mark('PageStart');</script><meta charset="utf-8"/> 
<title>Alibaba Group Holding Limited (BABA) Analyst Ratings, Estimates &amp; 
Forecasts - Yahoo Finance</title><meta con 
tent="recommendation,analyst,analyst 
rating,strong buy,strong 
sell,hold,buy,sell,overweight,underweight,upgrade,downgrade,price target,EPS 
estimate,revenue estimate,growth estimate,p/e 
estimate,recommendation,analyst,analyst rating,strong buy,strong 
sell,hold,buy,sell,overweight,underweight,upgrade,downgrade,price target,EPS 
estimate,revenue estimate,growth estimate,p/e estimate" name="keywords"/> 
<meta   content="on" http-equiv="x-dns-prefetch-control"/><meta content="on" 
property="twitter:dnt"/><meta content="90376669494" property="fb:app_id"/> 
<meta content="#400090" name="theme-color"/><meta content="width=device- 
width, 
  1. Is there any other way to extract the needed data that is NOT regular expressions from the object data ?
  2. How the soup object helps me extract the data (I see it is used a lot, but not sure how to make useful) ?

Thanks in Advance

Upvotes: 1

Views: 1873

Answers (2)

Brad123
Brad123

Reputation: 944

Here's what I have. The issue I'm getting is a ping limit. After a certain amount of requests I'm not able to get the information.

def yahoo_growth_soup(ticker , debug_mode=False):
"""
Returns the growth estimate for a ticker from Yahoo Finance.
"""
# Set up headers to avoid getting blocked by Yahoo Finance
headers = {'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246"}
# url = f"https://finance.yahoo.com/quote/{ticker}/analysis?p={ticker}"
url = f"https://finance.yahoo.com/quote/{ticker}/analysis"
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, "html5lib")

if debug_mode:
    output_file_path = r'E:\Finance\Python Scripts\debug'
    output_file_name = 'soup_output.txt'
    with open(os.path.join(output_file_path, output_file_name), "w", encoding="utf-8") as output_file:
        output_file.write(str(soup))

# find correct table:
value_element = soup.find("td", text="Next 5 Years (per annum)")

if value_element:
    value = value_element.find_next_sibling("td").text
    if value=='--':
        return 0.0
    growth_est = float(value.strip('%').replace(',', ''))
else: # value_element==None :
    # print('Unable to locate Yahoo Finance Growth Estimate')
    return None

return round(growth_est/100.0,3)

Upvotes: 0

Bertrand Martel
Bertrand Martel

Reputation: 45443

One solution is to extract the value from the JSON data in the JS using a regex. The JSON data is located in the following variable :

root.App.main = { .... };

Example :

import requests 
import re
import json

r = requests.get("https://finance.yahoo.com/quote/BABA/analysis?p=BABA")

data = json.loads(re.search('root\.App\.main\s*=\s*(.*);', r.text).group(1))

field = [t for t in data["context"]["dispatcher"]["stores"]["QuoteSummaryStore"]["earningsTrend"]["trend"] if t["period"] == "+5y" ][0]

print(field)
print("Next 5 Years (per annum) : " + field["growth"]["fmt"])

Upvotes: -1

Related Questions