mrgloom
mrgloom

Reputation: 21692

urllib.error.URLError: HTTP Error 403: Forbidden from urllib.request.urlopen

I'm trying to get ulr status via urllib.request.urlopen and in some cases it return urllib.error.URLError: HTTP Error 403: Forbidden howewer I can open this url from browser successfully. Is it possible to overcome this problem with urllib or better to use some other lib?

def urllib_status(url):
    REQUEST_TIMEOUT = 10

    if 'http' not in url:
        url = 'http://' + url

    try:
        response = urllib.request.urlopen(url, timeout=REQUEST_TIMEOUT)
        return response.status
    except urllib.error.URLError as e:
        print('url:'+url)
        print('urllib.error.URLError:', e)
        return -1
    except ssl.SSLError as e:
        print('url:'+url)
        print('ssl.SSLError:', e)
        return -1
    except socket.error as e:
        print('url:'+url)
        print("socket.error: ", e)
        return -1

Upvotes: 0

Views: 1464

Answers (2)

Railslide
Railslide

Reputation: 5564

The problem is likely to be due to the site not accepting non-browser requests. You can work around it by overriding the User-Agent header in your request (default is Python-urllib/3.X).

From Python docs:

 import urllib.request
 opener = urllib.request.build_opener()
 opener.addheaders = [('User-agent', 'Mozilla/5.0')]
 opener.open('http://www.example.com/')

Or, if you're using requests (the de facto standard HTTP library among Python users)

import requests
requests.get('http://www.example.com/', headers={'User-agent': 'Mozilla/5.0'})

Upvotes: 1

mrgloom
mrgloom

Reputation: 21692

It's simler using requests:

def url_status(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.0; WOW64; rv:24.0)'
                             ' Gecko/20100101 Firefox/24.0'}
    REQUEST_TIMEOUT = 10

    if 'http' not in url:
        url = 'http://' + url
    try:
        response = requests.get(url, headers=headers, timeout=REQUEST_TIMEOUT)
        if(response.status_code != 200):
            print(url)
            print('status',response.status_code)
        return response.status_code
    except Exception as e:
        print(url)
        print('Error',e)
        return -1

Upvotes: 0

Related Questions