Michael
Michael

Reputation: 21

Scraping Yahoo Earning Calendar

I am new to programming in Python. I am trying to get the Symbol, and Time.

I am able to get the time that works for me. It comes out with just the first word, sometimes it is a time, rather than a 'after/before' market closes.

But when it comes to the symbol I don't want any foreign markets, so nothing with a .?? in the symbol. Here is what I have so far. Sorry if it is a little sloppy. It is my first real program in python....

import requests
import urllib2
import re
from bs4 import BeautifulSoup

site= "http://www.nasdaq.com/earnings/report/acrx"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8'}

url = "http://biz.yahoo.com/research/earncal/20150309.html"

content = urllib2.urlopen(url).read()

soup = BeautifulSoup(content)

m = re.findall('center><small>\S+ ', content)
w = re.findall('\?s=\w+',content)

x=0
lp = (len(m))
xlp = lp -1


for x in range (xlp):
    print x, m[x+1][14:], w[x][3:]

Upvotes: 2

Views: 2742

Answers (1)

ReverseEngineerFox
ReverseEngineerFox

Reputation: 41

Instead of using regex, you can use BeautifulSoup to parse the HTML. You can use the time library to get the current date as well.

Here's an example code for scraping yahoo finance.

import requests
import bs4

def get_earning_data(date):
    html = requests.get("https://biz.yahoo.com/research/earncal/{}.html".format(date), headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"}).text
    soup = bs4.BeautifulSoup(html)
    quotes = []
    for tr in soup.find_all("tr"):
        if len(tr.contents) > 3:
            if len(tr.contents[1].contents) > 0:
                if tr.contents[1].contents[0].name == "a":
                    if tr.contents[1].contents[0]["href"].startswith("http://finance.yahoo.com/q?s="):
                        quotes.append({     "name"  : tr.contents[0].text
                                           ,"symbol": tr.contents[1].contents[0].text
                                           ,"url"   : tr.contents[1].contents[0]["href"]
                                           ,"eps"   : tr.contents[2].text if len(tr.contents) == 6 else u'N/A'
                                           ,"time"  : tr.contents[3].text if len(tr.contents) == 6 else tr.contents[2].text
                                       })
    return quotes

print(get_earning_data("20150309"))

Upvotes: 4

Related Questions