user7800892
user7800892

Reputation:

urllib.request.urlopen(url) with Authentication

I've been playing with beautiful soup and parsing web pages for a few days. I have been using a line of code which has been my saviour in all the scripts that I write. The line of code is :

r = requests.get('some_url', auth=('my_username', 'my_password')).

BUT ...

I want to do the same thing with (OPEN A URL WITH AUTHENTICATION):

(1) sauce = urllib.request.urlopen(url).read() (1)
(2) soup = bs.BeautifulSoup(sauce,"html.parser") (2)

I'm not able to open a url and read, the webpage which needs authentication. How do I achieve something like this :

  (3) sauce = urllib.request.urlopen(url, auth=(username, password)).read() (3) 
instead of (1)

Upvotes: 45

Views: 107250

Answers (5)

AndroidEngineX
AndroidEngineX

Reputation: 1481

In python3 using urllib you can use the below code

# Using urllib as this is a built-in tool and there is no need to install any third-party lib
import urllib
import urllib.request

auth = (
    "my_username",
    "my_password",
)

"""
This function needs to be called only once
Once the opener is installed subsequent call will use the same authentication
"""


def install_authenticated_request_opener():
    password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
    password_mgr.add_password(None, project_artifactory_url, auth[0], auth[1])

    handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

    # create "opener" (OpenerDirector instance)
    opener = urllib.request.build_opener(handler)

    # Install the opener.
    # Now all calls to urllib.request.urlopen use our opener.
    urllib.request.install_opener(opener)


def download_html(base_url):
    html_request = urllib.request.Request(some_url)
    try:
        result = urllib.request.urlopen(html_request)
    except urllib.error.URLError as e:
        # handling error as that is important 😉
        print(
            "Network call failed. Error code:",
            e.code or "no HTTP status code",
            "reason:",
            e.reason or "missing reason",
        )
        # uncomment to raise this exception
        # raise e
    else:
        # Everything is fine
        htmlText = result.read()
        parsed_html = BeautifulSoup(htmlText, "html.parser")
        # Do something with the parsed HTML

If you are with the requests library then you can use below code

import requests

auth = (
    "my_username",
    "my_password",
)

def download_html(some_url):
    resp = req.get(some_url, auth=auth)
    if resp.status_code != 200:
        print(
            "Network call failed. Error code:",
            resp.status_code or "no HTTP status code",
        )
        # uncomment to raise this exception
        # raise e
    else:
        # Everything is fine
        htmlText = result.read()
        parsed_html = BeautifulSoup(htmlText, "html.parser")
        # Do something with the parsed HTML

If you are using python2 for urllib solution, you have to go with this SO

You can also read this SO for the advantage of requests lib over urllib

Upvotes: 0

Apurva Singh
Apurva Singh

Reputation: 5010

Use this. This is standard urllib found with Python3 installation. Works great guaranteed. Also, see gist

import urllib.request

url = 'http://192.168.0.1/'

auth_user="username"
auth_passwd="^&%$$%^"

passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, auth_user, auth_passwd)
authhandler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(authhandler)
urllib.request.install_opener(opener)

res = urllib.request.urlopen(url)
res_body = res.read()
print(res_body.decode('utf-8'))

Upvotes: 3

Skippy le Grand Gourou
Skippy le Grand Gourou

Reputation: 7734

With urllib3 :

import urllib3

http = urllib3.PoolManager()
myHeaders = urllib3.util.make_headers(basic_auth='my_username:my_password')
http.request('GET', 'http://example.org', headers=myHeaders)

Upvotes: 6

moritzg
moritzg

Reputation: 4394

You're using HTTP Basic Authentication:

import urllib2, base64

request = urllib2.Request(url)
base64string = base64.b64encode('%s:%s' % (username, password))
request.add_header("Authorization", "Basic %s" % base64string)   
result = urllib2.urlopen(request)

So you should base64 encode the username and password and send it as an Authorization header.

Upvotes: 35

Christian König
Christian König

Reputation: 3570

Have a look at the HOWTO Fetch Internet Resources Using The urllib Package from the official docs:

# create a password manager
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()

# Add the username and password.
# If we knew the realm, we could use it instead of None.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)

handler = urllib.request.HTTPBasicAuthHandler(password_mgr)

# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)

# use the opener to fetch a URL
opener.open(a_url)

# Install the opener.
# Now all calls to urllib.request.urlopen use our opener.
urllib.request.install_opener(opener)

Upvotes: 26

Related Questions