Simon Lindgren
Simon Lindgren

Reputation: 2031

Overriding HTTP errors with urllib2

I have this code, but it is not working. I want to use urllib2 to iterate through a list of urls. Upon opening each url, BeautifulSoup locates a class and extracts that text. The program stalls if there is an invalid url in the list. If there is any error, I just want to have 'error' as the text, and for the program to continue on to the next url. Any ideas?

    for url in url_list:
         page=urllib2.urlopen(url)
         soup = BeautifulSoup(page.read())

         text = soup.find_all(class_='ProfileHeaderCard-locationText u-dir')
         if text is not None:
            for t in text:
                text2 = t.get_text().encode('utf-8')
         else:
            text2 = 'error'

Upvotes: 0

Views: 55

Answers (2)

Alex Martelli
Alex Martelli

Reputation: 881575

try/except is your friend! Change your code to s/thing like...:

for url in url_list:
    try:
        page = urllib2.urlopen(url)
    except urllib2.URLError:
        text2 = 'error'
    else:
        soup = BeautifulSoup(page.read())
        text = soup.find_all(class_='ProfileHeaderCard-locationText u-dir')
        if text:
           for t in text:
               text2 = t.get_text().encode('utf-8')
        else:
           text2 = 'error'

Upvotes: 3

vkorchagin
vkorchagin

Reputation: 656

urllib2.urlopen raises URLError on error as you can find in docs

Use try-except block:

try:
    page = urllib2.urlopen(url)
except urllib2.URLError as e:
    print e

Upvotes: 3

Related Questions