user2556720
user2556720

Reputation:

Python how to add exception?

@martineau I have updated my codes, is this what you meant ? How do i handle KeyError instead of NameError ?

url = "http://app2.nea.gov.sg/anti-pollution-radiation-protection/air-pollution/psi/psi-readings-over-the-last-24-hours"
web_soup = soup(urllib2.urlopen(url))

table = web_soup.find(name="div", attrs={'class': 'c1'}).find_all(name="div")[4].find_all('table')[0]

data = {}
cur_time = datetime.datetime.strptime("12AM", "%I%p")
for tr_index, tr in enumerate(table.find_all('tr')):
    if 'Time' in tr.text:
        continue
    for td_index, td in enumerate(tr.find_all('td')):
        if not td_index:
            continue
        data[cur_time] = td.text.strip()

        if td.find('strong'):
            bold_time = cur_time
            data[bold_time] = '20'
        cur_time += datetime.timedelta(hours=1)

        default_value = '20' # whatever you want it to be

    try:
        bold = data[bold_time]
    except NameError:

        bold_time = beforebold = beforebeforebold = default_value
    # might want to set "bold" to something, too, if needed
    else:   
        beforebold = data.get(bold_time - datetime.timedelta(hours=1)) 
        beforebeforebold =  data.get(bold_time - datetime.timedelta(hours=2))

This is where I print my data to do calculation.

print bold
print beforebold
print beforebeforebold

Upvotes: 0

Views: 247

Answers (2)

eyquem
eyquem

Reputation: 27575

I had read your previous post before it disappeared, and then I've read this one.
I find it a pity to use BeautifulSoup for your goal, because, from the code I see, I find its use complicated, and the fact is that regexes run roughly 10 times faster than BeautifulSoup.

Here's the code with only re, that furnishes the data you are interested in.
I know, there will people to say that HTML text can't be parsed by regexs. I know, I know... but I don't parse the text, I directly find the chunks of text that are interesting. The source code of the webpage of this site is apparently very well structured and it seems there is little risk of bugs. Moreover, tests and verification can be added to keep watch on the source code and to be instantly informed on the possible changings made by the webmaster in the webpage

import re
from httplib import HTTPConnection

hypr = HTTPConnection(host='app2.nea.gov.sg',
                      timeout = 300)
rekete = ('/anti-pollution-radiation-protection/'
          'air-pollution/psi/'
          'psi-readings-over-the-last-24-hours')

hypr.request('GET',rekete)
page = hypr.getresponse().read()


patime = ('PSI Readings.+?'
          'width="\d+%" align="center">\r\n'
          ' *<strong>Time</strong>\r\n'
          ' *</td>\r\n'
          '((?: *<td width="\d+%" align="center">'
          '<strong>\d+AM</strong>\r\n'
          ' *</td>\r\n)+.+?)'

          'width="\d+%" align="center">\r\n'
          ' *<strong>Time</strong>\r\n'
          ' *</td>\r\n'
          '((?: *<td width="\d+%" align="center">'
          '<strong>\d+PM</strong>\r\n'
          ' *</td>\r\n)+.+?)'
          'PM2.5 Concentration')
rgxtime = re.compile(patime,re.DOTALL)


patline = ('<td align="center">\r\n'
           ' *<strong>'             # next line = group 1
           '(North|South|East|West|Central|Overall Singapore)'
           '</strong>\r\n'
           ' *</td>\r\n'
           '((?: *<td align="center">\r\n'  # group 2 start
           ' *[.\d-]+\r\n'                  #
           ' *</td>\r\n)*)'                 # group 2 end

           ' *<td align="center">\r\n'
           ' *<strong style[^>]+>'
           '([.\d-]+)' # group 3
           '</strong>\r\n'
           ' *</td>\r\n')
rgxline = re.compile(patline)

rgxnb = re.compile('<td align="center">\r\n'
                   ' *([.\d-]+)\r\n'
                   ' *</td>\r\n')


m= rgxtime.search(page)

a,b = m.span(1) # m.group(1) contains the data AM
d = dict((mat.group(1),
          rgxnb.findall(mat.group(2))+[mat.group(3)])
         for mat in rgxline.finditer(page[a:b]))

a,b = m.span(2) # m.group(2) contains the data PM
for mat in rgxline.finditer(page[a:b]):
    d[mat.group(1)].extend(rgxnb.findall(mat.group(2))+[mat.group(3)])


print 'last 3 values'
for k,v in d.iteritems():
    print '%s  :  %s' % (k,v[-3:])

Upvotes: 0

martineau
martineau

Reputation: 123433

You need to add something to set data[bold_time]:

    if td.find('strong'):
        bold_time = cur_time
        data[bold_time] = ????? # whatever it should be
    cur_time += datetime.timedelta(hours=1)

This should avoid both the NameError and KeyError exceptions as long as the word strong is found. You still might want to code defensively and handle one or both of them gracefully. That what exception where meant to do, handle those exceptional cases that shouldn't happen...

Upvotes: 1

Related Questions