Reputation: 1622
I have a Python script that does an automatic download from a URL once a day.
Recently the authentication protecting the URL was changed. To get it to work with Internet Explorer I had to enable DES for Kerberos by adding SupportedEncryptionTypes " 0x7FFFFFFF" in a registry entry somewhere. Then it prompts me for my domain/user/password in IE when I browse to the site.
My python code that was working before is:
def __build_ntlm_opener(self):
passman = HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, self.answers_url, self.ntlm_username, self.ntlm_password)
ntlm_handler = HTTPNtlmAuthHandler(passman)
opener = urllib.request.build_opener(ntlm_handler)
opener.addheaders= [
#('User-agent', 'Mozilla/5.0 (Windows NT 6.0; rv:5.0) Gecko/20100101 Firefox/5.0')
('User-agent', 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)')
]
return opener
Now the code is failing with a simple 401 when using the opener:
urllib.error.HTTPError: HTTP Error 401: Unauthorized
I don't know much about Kerberos or DES but from what I see so far I can't figure out if urllib supports using these.
Is there any 3rd party library or trick I can use to get this working again?
Upvotes: 2
Views: 1897
Reputation: 25579
You could try using selenium's webdriver to directly drive a browser. I do that sometimes when I want to scrape sites that are dynamically generated. Here's a code example for opening a page and entering a password
from selenium import webdriver
b = webdriver.Chrome()
b.get('http://www.example.com')
username_field = b.find_element_by_id('username')
username_field.send_keys('my_username')
password_field = b.find_element_by_id('password')
password_field.send_keys('secret')
login_button = b.find_element_by_link_text('login').click()
That would get you past a typical login screen of a web site. Then
b.page_source
Will give you the source code for the page. Even if it was mainly generated with Javascript.
The source code is very simple to parse: http://code.google.com/p/selenium/source/browse/trunk/py/selenium/webdriver/remote/webelement.py
Upvotes: 1