Reputation: 3549
This code works for websites like google and yahoo and returns 'good'
import urllib.request as ur
#url="http://www.evga.com"
#url="http://www.asus.com/us/"
url="http://www.google.com"
import urllib.error as ure
try:
conn = ur.urlopen(url)
except ure.HTTPError as e:
# Return code error (e.g. 404, 501, ...)
# ...
print('HTTPError: {}'.format(e.code))
except ure.URLError as e:
# Not an HTTP-specific error (e.g. connection refused)
# ...
print('URLError: {}'.format(e.reason))
else:
# 200
# ...
print('good')
but for asus gives error 403 and for EVGA gives no response at all. How do I troubleshoot this problem?
Upvotes: 0
Views: 41
Reputation: 268
You're having a classic headers problem. urllib is not the best idea because you'll have a lot of implementation problems. Trust me URLLIB is a mess...
For web scraping I recommend either requests
or selenium
. The first one is a good start.
Let me share a requests
version of your code
import requests
url="http://www.evga.com"
#url="http://www.asus.com/us/"
#url="http://www.google.com"
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko Chrome/83.0.4103.97 Safari/537.36"}
r = requests.get(url, headers=headers)
print(r.status_code)
Yields:
200
I noticed "http://www.evga.com"
is a troublemaker but using headers you'll have all under control.
More info about requests: https://requests.readthedocs.io/en/master/
Upvotes: 1