Reputation:
I've been playing with beautiful soup and parsing web pages for a few days. I have been using a line of code which has been my saviour in all the scripts that I write. The line of code is :
r = requests.get('some_url', auth=('my_username', 'my_password')).
BUT ...
I want to do the same thing with (OPEN A URL WITH AUTHENTICATION):
(1) sauce = urllib.request.urlopen(url).read() (1)
(2) soup = bs.BeautifulSoup(sauce,"html.parser") (2)
I'm not able to open a url and read, the webpage which needs authentication. How do I achieve something like this :
(3) sauce = urllib.request.urlopen(url, auth=(username, password)).read() (3)
instead of (1)
Upvotes: 45
Views: 107250
Reputation: 1481
In python3
using urllib
you can use the below code
# Using urllib as this is a built-in tool and there is no need to install any third-party lib
import urllib
import urllib.request
auth = (
"my_username",
"my_password",
)
"""
This function needs to be called only once
Once the opener is installed subsequent call will use the same authentication
"""
def install_authenticated_request_opener():
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, project_artifactory_url, auth[0], auth[1])
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)
# Install the opener.
# Now all calls to urllib.request.urlopen use our opener.
urllib.request.install_opener(opener)
def download_html(base_url):
html_request = urllib.request.Request(some_url)
try:
result = urllib.request.urlopen(html_request)
except urllib.error.URLError as e:
# handling error as that is important 😉
print(
"Network call failed. Error code:",
e.code or "no HTTP status code",
"reason:",
e.reason or "missing reason",
)
# uncomment to raise this exception
# raise e
else:
# Everything is fine
htmlText = result.read()
parsed_html = BeautifulSoup(htmlText, "html.parser")
# Do something with the parsed HTML
If you are with the requests
library then you can use below code
import requests
auth = (
"my_username",
"my_password",
)
def download_html(some_url):
resp = req.get(some_url, auth=auth)
if resp.status_code != 200:
print(
"Network call failed. Error code:",
resp.status_code or "no HTTP status code",
)
# uncomment to raise this exception
# raise e
else:
# Everything is fine
htmlText = result.read()
parsed_html = BeautifulSoup(htmlText, "html.parser")
# Do something with the parsed HTML
If you are using python2
for urllib
solution, you have to go with this SO
You can also read this SO for the advantage of requests
lib over urllib
Upvotes: 0
Reputation: 5010
Use this. This is standard urllib found with Python3 installation. Works great guaranteed. Also, see gist
import urllib.request
url = 'http://192.168.0.1/'
auth_user="username"
auth_passwd="^&%$$%^"
passman = urllib.request.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, auth_user, auth_passwd)
authhandler = urllib.request.HTTPBasicAuthHandler(passman)
opener = urllib.request.build_opener(authhandler)
urllib.request.install_opener(opener)
res = urllib.request.urlopen(url)
res_body = res.read()
print(res_body.decode('utf-8'))
Upvotes: 3
Reputation: 7734
With urllib3 :
import urllib3
http = urllib3.PoolManager()
myHeaders = urllib3.util.make_headers(basic_auth='my_username:my_password')
http.request('GET', 'http://example.org', headers=myHeaders)
Upvotes: 6
Reputation: 4394
You're using HTTP Basic Authentication
:
import urllib2, base64
request = urllib2.Request(url)
base64string = base64.b64encode('%s:%s' % (username, password))
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request)
So you should base64
encode the username and password and send it as an Authorization
header.
Upvotes: 35
Reputation: 3570
Have a look at the HOWTO Fetch Internet Resources Using The urllib Package from the official docs:
# create a password manager
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
# If we knew the realm, we could use it instead of None.
top_level_url = "http://example.com/foo/"
password_mgr.add_password(None, top_level_url, username, password)
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)
# use the opener to fetch a URL
opener.open(a_url)
# Install the opener.
# Now all calls to urllib.request.urlopen use our opener.
urllib.request.install_opener(opener)
Upvotes: 26