Reputation: 427
In a python 2.x script, I am looking for functionality to check if a https page returns particular content (will likely need to parse the page content to discover this). The page has a htpasswd prompt as well, which needs to be auth'd with a username and password to be able to see the content. So I suppose I am looking for a module or other functionality that provides the ability for me to hardcode a username and password so it can fetch the page and I can manipulate the output (aka check if the equivalent of keywords representing a 404 page are present).
I was having a look at http://docs.python.org/2/library/httplib.html but it doesn't seem to do what I am looking for.
Upvotes: 0
Views: 671
Reputation: 1121486
You could do it with the httplib
module but there are easier methods that don't require manual driving of the HTTP protocol.
Using the requests
library (an external module requiring installation first) is probably simplest:
import requests
auth = ('someusername', 'somepassword')
response = requests.get(yoururl, auth=auth)
response.raise_for_status()
This will raise an exception if the response wasn't successful or returned a 404 Not Found.
You can then further parse the response body with response.content
(byte string) or response.text
(a unicode response).
Using just the standard library, using the urllib2
module would look like:
import urllib2, base64
request = urllib2.Request(yoururl)
authstring = base64.encodestring('{}:{}'.format('someusername', 'somepassword')).strip()
request.add_header("Authorization", "Basic {}".format(authstring))
response = urllib2.urlopen(request)
if not 200 <= response.getcode() < 400:
# error response, raise an exception here?
content = response.read()
try:
text = content.decode(response.info().getparam('charset', 'utf8'))
except UnicodeDecodeError:
text = content.decode('ascii', 'replace')
where content
is the byte string contents of the response body, and text
would be the unicode value, up to a point.
Upvotes: 2