Nishant
Nishant

Reputation: 21914

HTTP 400 Bad Request when making HTTPS connection in Python?

I am getting an HTTP 400 - Bad Request error while making an XML-RPC over HTTPS in Python 3.8.

It seems like this issue is happening when we are supplying the Host header in the HTTPS request, without skip_host=True in the putrequest (doc) call before that. Are both these info's -- skip_host argument and Host header, mutually exclusive? If so, which one should I use?

import http.client

connection = http.client.HTTPSConnection("duckduckgo.com", "443")

connection.putrequest("GET", "/")  # needs skip_host=True if Host has to be supplied
connection.putheader("User-Agent", "Python/3.8")
connection.putheader("Host", "duckduckgo.com")  # needs skip_host=True to work
connection.endheaders()

response = connection.getresponse()
print(response.status, response.reason)

Update: This issue doesn't happen with all HTTPS servers, as mentioned in the official docs.

Upvotes: 2

Views: 992

Answers (1)

Nikolaos Chatzis
Nikolaos Chatzis

Reputation: 1979

Leaving skip_host to its default value, i.e., False, and specifying a Host header using putheader results in sending the Host header twice (and in this example with different values). This can be checked by setting the debuglevel to a positive value.

>>> import http.client
>>> connection = http.client.HTTPSConnection("duckduckgo.com", "443")
>>> connection.set_debuglevel(1)
>>> connection.putrequest("GET", "/")
>>> connection.putheader("User-Agent", "Python/3.8")
>>> connection.putheader("Host", "duckduckgo.com")
>>> connection.endheaders()
send: b'GET / HTTP/1.1\r\nHost: duckduckgo.com:443\r\nAccept-Encoding: identity\r\nUser-Agent: Python/3.8\r\nHost: duckduckgo.com\r\n\r\n'
>>> 
>>> response = connection.getresponse()
reply: 'HTTP/1.1 400 Bad Request\r\n'
header: Server header: Date header: Content-Type header: Content-Length header: Connection header: X-XSS-Protection header: X-Content-Type-Options header: Referrer-Policy header: Expect-CT

In http.client's code it is mentioned that sending the Host header twice can be "confusing" for some web servers. See the following comment in putrequest:

            if not skip_host:
                # this header is issued *only* for HTTP/1.1
                # connections. more specifically, this means it is
                # only issued when the client uses the new
                # HTTPConnection() class. backwards-compat clients
                # will be using HTTP/1.0 and those clients may be
                # issuing this header themselves. we should NOT issue
                # it twice; some web servers (such as Apache) barf
                # when they see two Host: headers

Your code will work either by adding skip_host=True or by not explicitly specifying a Host header. Both result in sending the Host header once.

>>> import http.client
>>> connection = http.client.HTTPSConnection("duckduckgo.com", "443")
>>> connection.putrequest("GET", "/", skip_host=True)
>>> connection.putheader("User-Agent", "Python/3.8")
>>> connection.putheader("Host", "duckduckgo.com")
>>> connection.endheaders()
>>> response = connection.getresponse()
>>> print(response.status, response.reason)
200 OK
>>> # OR
>>> connection = http.client.HTTPSConnection("duckduckgo.com", "443")
>>> connection.putrequest("GET", "/")
>>> connection.putheader("User-Agent", "Python/3.8")
>>> connection.endheaders()
>>> response = connection.getresponse()
>>> print(response.status, response.reason)
200 OK

As to which one to use, the docs seem to suggest that unless you have a reason to specify a Host header (using putheader) you can rely on the module's automatic sending of the Host header, i.e., leave skip_host to its default value, i.e., False, and do not specify a Host header using putheader.

Upvotes: 4

Related Questions