user1190201
user1190201

Reputation: 1

What is the right way to handle errors?

My script below scrapes a website and returns the data from a table. It's not finished but it works. The problem is that it has no error checking. Where should I have error handling in my script?

There are no unittests, should I write some and schedule my unittests to be run periodicaly. Or should the error handling be done in my script?

Any advice on the proper way to do this would be great.

#!/usr/bin/env python
''' Gets the Canadian Monthly Residential Bill Calculations table
    from URL and saves the results to a sqllite database.
'''
import urllib2
from BeautifulSoup import BeautifulSoup


class Bills():
    ''' Canadian Monthly Residential Bill Calculations '''

    URL = "http://www.hydro.mb.ca/regulatory_affairs/energy_rates/electricity/utility_rate_comp.shtml"

    def __init__(self):
        ''' Initialization '''

        self.url = self.URL
        self.data = []
        self.get_monthly_residential_bills(self.url)

    def get_monthly_residential_bills(self, url):
        ''' Gets the Monthly Residential Bill Calculations table from URL '''

        doc = urllib2.urlopen(url)
        soup = BeautifulSoup(doc)
        res_table = soup.table.th.findParents()[1]
        results = res_table.findNextSibling()
        header = self.get_column_names(res_table)
        self.get_data(results)
        self.save(header, self.data)

    def get_data(self, results):
        ''' Extracts data from search result. '''

        rows = results.childGenerator()
        data = []
        for row in rows:
            if row == "\n":
                continue
            for td in row.contents:
                if td == "\n":
                    continue
                data.append(td.text)
            self.data.append(tuple(data))
            data = []

    def get_column_names(self, table):
        ''' Gets table title, subtitle and column names '''

        results = table.findAll('tr')
        title = results[0].text
        subtitle = results[1].text
        cols = results[2].childGenerator()
        column_names = []
        for col in cols:
            if col == "\n":
                continue
            column_names.append(col.text)

        return title, subtitle, column_names

    def save(self, header, data):
        pass

if __name__ == '__main__':
    a = Bills()
    for td in a.data:
        print td

Upvotes: 0

Views: 165

Answers (4)

Crowman
Crowman

Reputation: 25908

The answer to "where should I have error handling in my script?" is basically "any place where something could go wrong", which depends entirely on the logic of your program.

In general, any place where your program relies on an assumption that a particular operation worked as you intended, and there's a possibility that it may not have, you should add code to check whether or not it actually did work, and take appropriate remedial action if it didn't. In some cases, the underlying code might generate an exception on failure and you may be happy to just let the program terminate with an uncaught exception without adding any error-handling code of your own, but (1) this would be, or ought to be, rare if anyone other than you is ever going to use that program; and (2) I'd say this would fall into the "works as intended" category anyway.

Upvotes: 0

clan
clan

Reputation: 353

A copple places you need to have them.is in importing things like tkinter try: import Tkinter as tk except: import tkinter as tk also anywhere where the user enters something with a n intended type. A good way to figure this out is to run it abd try really hard to make it crash. Eg typing in wrong type.

Upvotes: 0

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798666

You should write unit tests and you should use exception handling. But only catch the exceptions you can handle; you do no one any favors by catching everything and throwing any useful information out.

Unit tests aren't run periodically though; they're run before and after the code changes (although it is feasible for one change's "after" to become another change's "before" if they're close enough).

Upvotes: 1

shadyabhi
shadyabhi

Reputation: 17234

See the documentation of all the functions and see what all exceptions do they throw.

For ex, in urllib2.urlopen(), it's written that Raises URLError on errors. It's a subclass of IOError.

So, for the urlopen(), you could do something like:

try:
    doc = urllib2.urlopen(url)
except IOError:
    print >> sys.stderr, 'Error opening URL' 

Similary, do the same for others.

Upvotes: 2

Related Questions