anu.agg
anu.agg

Reputation: 197

Use Mechanize (Python) to get redirection log

I want to get the url redirection log using Mechanize written in Python. For example, www.google.com --> www.google.co.in. The exact question has been asked before in SO but it is for Ruby

How to get redirect log in Mechanize?

The answer explains that to do this one can do the following in Ruby -

for m.redirection_limit in 0..99
  begin
    m.get(url)
    break
    rescue WWW::Mechanize::RedirectLimitReachedError
      # code here could get control at
      # intermediate redirection levels
  end
end

I want to do the same using Python. Any help? What is the alternate of get(url) in Python for Mechanize?

Upvotes: 2

Views: 2233

Answers (3)

user926321
user926321

Reputation: 400

j.f sebastian's answer works great if they are http redirections, but this would fail if they were javascript redirections. (urllib2 doesnt handle javascript redirections but Mechanize does!)

this should work for both types of redirections though!

import mechanize
import logging
import sys
logger = logging.getLogger("mechanize")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.INFO)

browser = mechanize.Browser()
browser.set_debug_redirects(True)

r=browser.open("http://google.com")

Upvotes: 1

jfs
jfs

Reputation: 414207

You could override HTTPRedirectHandler.redirect_request() method to save a redirection history:

import urllib2

class HTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        newreq = urllib2.HTTPRedirectHandler.redirect_request(self,
            req, fp, code, msg, headers, newurl)
        if newreq is not None:
            self.redirections.append(newreq.get_full_url())
        return newreq

url = 'http://google.com'

h = HTTPRedirectHandler()
h.max_redirections = 100
h.redirections = [url]
opener = urllib2.build_opener(h)
response = opener.open(url)
print h.redirections
# -> ['http://google.com', 'http://www.google.com/', 'http://google.com.ua/']

It should be much faster than the provided WWW::Mechanize code snippet because urllib2 visits each url only once.

mechanize provides a superset of urllib2 functionality i.e., if you use mechanize then just replace every occurrence of urllib2 above with mechanize and it will work.

Upvotes: 2

Silas Ray
Silas Ray

Reputation: 26160

I was going to give you an 'IGIFY', but you are right, mechanize documentation sucks. Poking around a bit, it looks like you should look at urllib2, as mechanize exposes that entire interface.

Upvotes: 1

Related Questions