Reputation: 21
I am new to programming in Python. I am trying to get the Symbol, and Time.
I am able to get the time that works for me. It comes out with just the first word, sometimes it is a time, rather than a 'after/before' market closes.
But when it comes to the symbol I don't want any foreign markets, so nothing with a .?? in the symbol. Here is what I have so far. Sorry if it is a little sloppy. It is my first real program in python....
import requests
import urllib2
import re
from bs4 import BeautifulSoup
site= "http://www.nasdaq.com/earnings/report/acrx"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8'}
url = "http://biz.yahoo.com/research/earncal/20150309.html"
content = urllib2.urlopen(url).read()
soup = BeautifulSoup(content)
m = re.findall('center><small>\S+ ', content)
w = re.findall('\?s=\w+',content)
x=0
lp = (len(m))
xlp = lp -1
for x in range (xlp):
print x, m[x+1][14:], w[x][3:]
Upvotes: 2
Views: 2742
Reputation: 41
Instead of using regex, you can use BeautifulSoup to parse the HTML. You can use the time library to get the current date as well.
Here's an example code for scraping yahoo finance.
import requests
import bs4
def get_earning_data(date):
html = requests.get("https://biz.yahoo.com/research/earncal/{}.html".format(date), headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0"}).text
soup = bs4.BeautifulSoup(html)
quotes = []
for tr in soup.find_all("tr"):
if len(tr.contents) > 3:
if len(tr.contents[1].contents) > 0:
if tr.contents[1].contents[0].name == "a":
if tr.contents[1].contents[0]["href"].startswith("http://finance.yahoo.com/q?s="):
quotes.append({ "name" : tr.contents[0].text
,"symbol": tr.contents[1].contents[0].text
,"url" : tr.contents[1].contents[0]["href"]
,"eps" : tr.contents[2].text if len(tr.contents) == 6 else u'N/A'
,"time" : tr.contents[3].text if len(tr.contents) == 6 else tr.contents[2].text
})
return quotes
print(get_earning_data("20150309"))
Upvotes: 4