Regular Expressions Python 3.4

Question

I'm Currently starting a web scraper, and it's been a while since I've used python. I'm sure I have messy code too. Oh well.

def retrieveHTML():
import re
import urllib.request
from urllib.request import urlopen


urls = ["http://finance.yahoo.com/q?s=^dji", "http://finance.yahoo.com/q?s=^gspc"]
i = 0
while i < len(urls):

    htmlfile = urllib.request.urlopen(urls[i])
    htmltext = htmlfile.read()

    if (i == 0):
        regex = b'(.+?)'
    if (i == 1):
        regex = b'(.+?)'

    pattern = re.compile(regex)
    price = pattern.match(htmltext)
    print (price)
    i += 1
retrieveHTML()

The regular expression is intended to find the price of the stock, and it returns "None". You'll find that bit of html defined as the regex by inspecting the element of the large price at the top of the page, just in case there is any ambiguity on that.

Gang Liang · Accepted Answer

I know it is off topic, :).

I would kindly suggest OP to use xpath in the xml package. I scrape websites like yahoo as well. The xml package saved me a lot of time and energy. Doing everything through regex is a pain in the neck.

Regular Expressions Python 3.4

Answers (2)

Related Questions