Reputation: 197
I want to get the url redirection log using Mechanize written in Python. For example, www.google.com --> www.google.co.in. The exact question has been asked before in SO but it is for Ruby
How to get redirect log in Mechanize?
The answer explains that to do this one can do the following in Ruby -
for m.redirection_limit in 0..99
begin
m.get(url)
break
rescue WWW::Mechanize::RedirectLimitReachedError
# code here could get control at
# intermediate redirection levels
end
end
I want to do the same using Python. Any help? What is the alternate of get(url) in Python for Mechanize?
Upvotes: 2
Views: 2233
Reputation: 400
j.f sebastian's answer works great if they are http redirections, but this would fail if they were javascript redirections. (urllib2 doesnt handle javascript redirections but Mechanize does!)
this should work for both types of redirections though!
import mechanize
import logging
import sys
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.INFO)
browser = mechanize.Browser()
browser.set_debug_redirects(True)
r=browser.open("http://google.com")
Upvotes: 1
Reputation: 414207
You could override HTTPRedirectHandler.redirect_request()
method to save a redirection history:
import urllib2
class HTTPRedirectHandler(urllib2.HTTPRedirectHandler):
def redirect_request(self, req, fp, code, msg, headers, newurl):
newreq = urllib2.HTTPRedirectHandler.redirect_request(self,
req, fp, code, msg, headers, newurl)
if newreq is not None:
self.redirections.append(newreq.get_full_url())
return newreq
url = 'http://google.com'
h = HTTPRedirectHandler()
h.max_redirections = 100
h.redirections = [url]
opener = urllib2.build_opener(h)
response = opener.open(url)
print h.redirections
# -> ['http://google.com', 'http://www.google.com/', 'http://google.com.ua/']
It should be much faster than the provided WWW::Mechanize
code snippet because urllib2
visits each url only once.
mechanize
provides a superset of urllib2
functionality i.e., if you use mechanize
then just replace every occurrence of urllib2
above with mechanize
and it will work.
Upvotes: 2
Reputation: 26160
I was going to give you an 'IGIFY', but you are right, mechanize documentation sucks. Poking around a bit, it looks like you should look at urllib2, as mechanize exposes that entire interface.
Upvotes: 1