user1861156
user1861156

Reputation: 169

Python Yahoo Stock Exchange (Web Scraping)

I'm having trouble with the following code, it's suppose to print the stock prices by accessing yahoo finance but I can't figure out why its returning empty strings?

import urllib
import re

symbolslist = ["aapl","spy", "goog","nflx"]
i = 0
while i < len(symbolslist):
    url = "http://finance.yahoo.com/q?s="+symbolslist[i]+"&q1=1"
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()

    regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
    pattern = re.compile(regex)
    price = re.findall(pattern,htmltext)
    print price
    i+=1

Edit: It works fine now, it was a syntax error. Edited the code above as well.

Upvotes: 0

Views: 1941

Answers (1)

Ewan
Ewan

Reputation: 15058

These are just a few helpful tips for python development (and scraping):

Python Requests library.

The python requests library is excellent at simplifying the requests process.

No need to use a while loop

for loops are really useful in this situation.

symbolslist = ["aapl","spy", "goog","nflx"]
for symbol in symbolslist:
    # Do logic here...

Use xpath over regular expressions

import requests
import lxml

url = "http://www.google.co.uk/finance?q="+symbol+"&q1=1"
r = requests.get(url)
xpath = '//your/xpath'
root = lxml.html.fromstring(r.content)

No need to compile your regular expressions each time.

Compiling regex's takes time and effort. You can abstract these out of your loop.

regex = '<span id="yfs_l84_' + symbolslist[i] + '">(.+?)</span>'
pattern = re.compile(regex)

for symbol in symbolslist:
    # do logic

External Libraries

As mentioned in the comment by drewk both Pandas and Matplot have native functions to get Yahoo quotes or you can use the ystockquote library to scrape from Yahoo. This is used like so:

#!/bin/env python
import ystockquote

symbolslist = ["aapl","spy", "goog","nflx"]
for symbol in symbolslist:
    print (ystockquote.get_price(symbol))

Upvotes: 1

Related Questions