Reputation: 1711
I'm using the python requests library for some http checking on an application. I have a situation where I need to send in an initial host header on the requests, but this should not be used when following redirects where it is causing a problem.
I've had a look around the request docs but I can't see a way I can have requests drop the request headers when following redirects.
Here is an example of my problem
import requests
from requests.structures import CaseInsensitiveDict
s = requests.Session()
request_headers = CaseInsensitiveDict()
request_headers['host'] = 'google.co.uk'
response = s.get("http://google.co.uk",allow_redirects=True,headers=request_headers)
In this case google.co.uk will redirect to https://www.google.co.uk, but get stuck in a loop because it will send the host header set to 'google.co.uk' even after it follows the redirect.
I always need to use manual host header on the first request due to this going through a CDN which uses a header to determine the site it is serving for. Removing it from the initial request is not an option.
Here is an equivalent curl, which does drop the host header after the initial request. This is the behaviour I would like to see / expect from Python Requests
curl -H "Host: google.co.uk" http://google.co.uk -L -o /dev/null
Upvotes: 2
Views: 4992
Reputation: 1
Very late reply. I just came across this post looking for something else. It may help somebody looking for an answer.
Look into the request "hook" mechanism, you can specify a callback when the response comes back (each response will call your hook): https://2.python-requests.org/en/master/user/advanced/#event-hooks
From within the callback, you will be able to remove/modify/add the header(either blindly or when the status code is 3xx or ...).
Upvotes: 0
Reputation: 797
curl does not drop the Host header. It sends a second request with the header Host: www.google.co.uk
(created from URL to which the redirect leads).
As ZhongYu wrote, you don't need to specify the Host header. So if your goal was only to download the page, the solution would be simply to omit the headers
argument:
response = s.get("http://google.co.uk",allow_redirects=True)
But if your goal is some http checking on, may be this would be the solution:
import requests
resp = requests.get("http://google.co.uk",allow_redirects=False)
while resp.status_code == 301:
resp = requests.get(resp.headers['location'],allow_redirects=False)
Upvotes: 2