Reputation: 2031
I have this code, but it is not working. I want to use urllib2 to iterate through a list of urls. Upon opening each url, BeautifulSoup locates a class and extracts that text. The program stalls if there is an invalid url in the list. If there is any error, I just want to have 'error' as the text, and for the program to continue on to the next url. Any ideas?
for url in url_list:
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
text = soup.find_all(class_='ProfileHeaderCard-locationText u-dir')
if text is not None:
for t in text:
text2 = t.get_text().encode('utf-8')
else:
text2 = 'error'
Upvotes: 0
Views: 55
Reputation: 881575
try/except
is your friend! Change your code to s/thing like...:
for url in url_list:
try:
page = urllib2.urlopen(url)
except urllib2.URLError:
text2 = 'error'
else:
soup = BeautifulSoup(page.read())
text = soup.find_all(class_='ProfileHeaderCard-locationText u-dir')
if text:
for t in text:
text2 = t.get_text().encode('utf-8')
else:
text2 = 'error'
Upvotes: 3
Reputation: 656
urllib2.urlopen raises URLError on error as you can find in docs
Use try-except block:
try:
page = urllib2.urlopen(url)
except urllib2.URLError as e:
print e
Upvotes: 3