Reputation: 5004
I have a developed a part of code which I use from web scraping:
link = 'http://www.cmegroup.com'+div.findAll('a')[3]['href']
user_agent = 'Mozilla/5.0'
headers = {'User-Agent':user_agent}
req = urllib2.Request(link, headers=headers)
page = urllib2.urlopen(req).read()
However what I don't understand is sometimes I get an error requesting the link. But sometimes, I don't. For example, the error:
urllib2.URLError: <urlopen error [Errno -2] Name or service not known>
came out for this link:
http://www.cmegroup.com/trading/energy/refined-products/mini-european-naphtha-platts-cif-nwe-swap-futures_product_calendar_futures.html
When I re-run the code, I won't get an error for this link again, but for some other. Could this be due a wireless connection?
Upvotes: 1
Views: 3023
Reputation: 19432
This looks like a DNS or network problem. If you run the same code for the same URL several times and it sometimes works but sometimes doesn't, the problem is probably not your code.
To debug the issue, you could do a try-except block around the statement and start pdb or ipdb (if installed) from there:
try:
response = urllib2.urlopen(req)
except urllib2.URLError as ex:
import pdb; pdb.set_trace() # Use ipdb if installed
else:
page = response.read()
Then you can take a look at the response, the status code, the exception trace etc...
(As a sidenote, if external dependencies are not a problem, I'd strongly recommend to use the requests package instead of urllib2.)
Upvotes: 2