John
John

Reputation: 15286

Is there an easy way to request a URL in python and NOT follow redirects?

Looking at the source of urllib2 it looks like the easiest way to do it would be to subclass HTTPRedirectHandler and then use build_opener to override the default HTTPRedirectHandler, but this seems like a lot of (relatively complicated) work to do what seems like it should be pretty simple.

Upvotes: 156

Views: 130818

Answers (7)

olt
olt

Reputation: 2347

Dive Into Python has a good chapter on handling redirects with urllib2. Another solution is httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location

Upvotes: 36

Marian
Marian

Reputation: 15338

Here is the Requests way:

import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])

Upvotes: 293

Ian Mackinnon
Ian Mackinnon

Reputation: 14238

The redirections keyword in the httplib2 request method is a red herring. Rather than return the first request it will raise a RedirectLimit exception if it receives a redirection status code. To return the inital response you need to set follow_redirects to False on the Http object:

import httplib2
h = httplib2.Http()
h.follow_redirects = False
(response, body) = h.request("http://example.com")

Upvotes: 9

Tzury Bar Yochay
Tzury Bar Yochay

Reputation: 9004

The shortest way however is

class NoRedirect(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
        pass

noredir_opener = urllib2.build_opener(NoRedirect())

Upvotes: 6

Carles Barrobés
Carles Barrobés

Reputation: 11683

This is a urllib2 handler that will not follow redirects:

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)

Upvotes: 12

Ashish
Ashish

Reputation: 430

i suppose this would help

from httplib2 import Http
def get_html(uri,num_redirections=0): # put it as 0 for not to follow redirects
conn = Http()
return conn.request(uri,redirections=num_redirections)

Upvotes: 8

Aaron Maenpaa
Aaron Maenpaa

Reputation: 122890

I second olt's pointer to Dive into Python. Here's an implementation using urllib2 redirect handlers, more work than it should be? Maybe, shrug.

import sys
import urllib2

class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301( 
            self, req, fp, code, msg, headers)              
        result.status = code                                 
        raise Exception("Permanent Redirect: %s" % 301)

    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        raise Exception("Temporary Redirect: %s" % 302)

def main(script_name, url):
   opener = urllib2.build_opener(RedirectHandler)
   urllib2.install_opener(opener)
   print urllib2.urlopen(url).read()

if __name__ == "__main__":
    main(*sys.argv) 

Upvotes: 5

Related Questions