Reputation: 1
My script below scrapes a website and returns the data from a table. It's not finished but it works. The problem is that it has no error checking. Where should I have error handling in my script?
There are no unittests, should I write some and schedule my unittests to be run periodicaly. Or should the error handling be done in my script?
Any advice on the proper way to do this would be great.
#!/usr/bin/env python
''' Gets the Canadian Monthly Residential Bill Calculations table
from URL and saves the results to a sqllite database.
'''
import urllib2
from BeautifulSoup import BeautifulSoup
class Bills():
''' Canadian Monthly Residential Bill Calculations '''
URL = "http://www.hydro.mb.ca/regulatory_affairs/energy_rates/electricity/utility_rate_comp.shtml"
def __init__(self):
''' Initialization '''
self.url = self.URL
self.data = []
self.get_monthly_residential_bills(self.url)
def get_monthly_residential_bills(self, url):
''' Gets the Monthly Residential Bill Calculations table from URL '''
doc = urllib2.urlopen(url)
soup = BeautifulSoup(doc)
res_table = soup.table.th.findParents()[1]
results = res_table.findNextSibling()
header = self.get_column_names(res_table)
self.get_data(results)
self.save(header, self.data)
def get_data(self, results):
''' Extracts data from search result. '''
rows = results.childGenerator()
data = []
for row in rows:
if row == "\n":
continue
for td in row.contents:
if td == "\n":
continue
data.append(td.text)
self.data.append(tuple(data))
data = []
def get_column_names(self, table):
''' Gets table title, subtitle and column names '''
results = table.findAll('tr')
title = results[0].text
subtitle = results[1].text
cols = results[2].childGenerator()
column_names = []
for col in cols:
if col == "\n":
continue
column_names.append(col.text)
return title, subtitle, column_names
def save(self, header, data):
pass
if __name__ == '__main__':
a = Bills()
for td in a.data:
print td
Upvotes: 0
Views: 165
Reputation: 25908
The answer to "where should I have error handling in my script?" is basically "any place where something could go wrong", which depends entirely on the logic of your program.
In general, any place where your program relies on an assumption that a particular operation worked as you intended, and there's a possibility that it may not have, you should add code to check whether or not it actually did work, and take appropriate remedial action if it didn't. In some cases, the underlying code might generate an exception on failure and you may be happy to just let the program terminate with an uncaught exception without adding any error-handling code of your own, but (1) this would be, or ought to be, rare if anyone other than you is ever going to use that program; and (2) I'd say this would fall into the "works as intended" category anyway.
Upvotes: 0
Reputation: 353
A copple places you need to have them.is in importing things like tkinter try: import Tkinter as tk except: import tkinter as tk also anywhere where the user enters something with a n intended type. A good way to figure this out is to run it abd try really hard to make it crash. Eg typing in wrong type.
Upvotes: 0
Reputation: 798666
You should write unit tests and you should use exception handling. But only catch the exceptions you can handle; you do no one any favors by catching everything and throwing any useful information out.
Unit tests aren't run periodically though; they're run before and after the code changes (although it is feasible for one change's "after" to become another change's "before" if they're close enough).
Upvotes: 1
Reputation: 17234
See the documentation of all the functions and see what all exceptions
do they throw.
For ex, in urllib2.urlopen()
, it's written that Raises URLError on errors. It's a subclass of IOError
.
So, for the urlopen()
, you could do something like:
try:
doc = urllib2.urlopen(url)
except IOError:
print >> sys.stderr, 'Error opening URL'
Similary, do the same for others.
Upvotes: 2