joeButler
joeButler

Reputation: 1711

python requests - remove headers on redirect

I'm using the python requests library for some http checking on an application. I have a situation where I need to send in an initial host header on the requests, but this should not be used when following redirects where it is causing a problem.

I've had a look around the request docs but I can't see a way I can have requests drop the request headers when following redirects.

Here is an example of my problem

import requests
from requests.structures import CaseInsensitiveDict

s = requests.Session()
request_headers = CaseInsensitiveDict()
request_headers['host'] = 'google.co.uk'

response = s.get("http://google.co.uk",allow_redirects=True,headers=request_headers)

In this case google.co.uk will redirect to https://www.google.co.uk, but get stuck in a loop because it will send the host header set to 'google.co.uk' even after it follows the redirect.

I always need to use manual host header on the first request due to this going through a CDN which uses a header to determine the site it is serving for. Removing it from the initial request is not an option.

Here is an equivalent curl, which does drop the host header after the initial request. This is the behaviour I would like to see / expect from Python Requests

curl -H "Host: google.co.uk" http://google.co.uk -L -o /dev/null 

Upvotes: 2

Views: 4992

Answers (2)

MarClown
MarClown

Reputation: 1

Very late reply. I just came across this post looking for something else. It may help somebody looking for an answer.

Look into the request "hook" mechanism, you can specify a callback when the response comes back (each response will call your hook): https://2.python-requests.org/en/master/user/advanced/#event-hooks

From within the callback, you will be able to remove/modify/add the header(either blindly or when the status code is 3xx or ...).

Upvotes: 0

hancar
hancar

Reputation: 797

curl does not drop the Host header. It sends a second request with the header Host: www.google.co.uk (created from URL to which the redirect leads).

As ZhongYu wrote, you don't need to specify the Host header. So if your goal was only to download the page, the solution would be simply to omit the headers argument:

 response = s.get("http://google.co.uk",allow_redirects=True)

But if your goal is some http checking on, may be this would be the solution:

import requests

resp = requests.get("http://google.co.uk",allow_redirects=False)
while resp.status_code == 301:
    resp = requests.get(resp.headers['location'],allow_redirects=False)

Upvotes: 2

Related Questions