Fluffy
Fluffy

Reputation: 28362

How to get redirect log in Mechanize?

In ruby, if you use mechanize following 301/302 redirects like this

require 'mechanize'

m = WWW::Mechanize.new
m.get('http://google.com')

how to get the list of the pages mechanize was redirected through? (Like http://google.com => http://www.google.com => http://google.com.ua)

OK, here is the code in mechanize responsible for redirection

 elsif res_klass <= Net::HTTPRedirection
        return page unless follow_redirect?
        log.info("follow redirect to: #{ response['Location'] }") if log
        from_uri  = page.uri
        raise RedirectLimitReachedError.new(page, redirects) if redirects + 1 > redirection_limit
        redirect_verb = options[:verb] == :head ? :head : :get
        page = fetch_page(  :uri => response['Location'].to_s,
                            :referer => page,
                            :params  => [],
                            :verb => redirect_verb,
                            :redirects => redirects + 1
                         )
        @history.push(page, from_uri)
        return page

but trying to m.history.map {|p| puts p.uri} shows 3 times the uri of last page..

Upvotes: 0

Views: 3329

Answers (2)

fuzzygroup
fuzzygroup

Reputation: 1149

The key here is to take advantage of the built in logging in Mechanize. Here's a full code sample using the built in Rails logging facilities.

require 'mechanize'

require 'logger'

mechanize_logger = Logger.new('log/mechanize.log')

mechanize_logger.level = Logger::INFO

url = 'http://google.com'

agent = Mechanize.new

agent.log = mechanize_logger

agent.get(url)

And then check the output of log/mechanize.log in your log directory and you'll see the whole mechanize process including the intermediate urls.

Upvotes: 2

DigitalRoss
DigitalRoss

Reputation: 146053

I'm not certain, but here are a couple of things to try:

  1. see what's in m.history[i].uri after the get()

  2. You might need something like:

    for m.redirection_limit in 0..99
      begin
        m.get(url)
        break
        rescue WWW::Mechanize::RedirectLimitReachedError
          # code here could get control at
          # intermediate redirection levels
      end
    end

Upvotes: 0

Related Questions