user3725459
user3725459

Reputation: 424

Apparently fine http request results malformed when sent over socket

I'm working with socket operations and have coded a basic interception proxy in python. It works fine, but some hosts return 400 bad request responses.

These requests do not look malformed though. Here's one:

GET http://www.baltour.it/ HTTP/1.1
Host: www.baltour.it
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

Same request, raw:

GET http://www.baltour.it/ HTTP/1.1\r\nHost: www.baltour.it\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en-US,en;q=0.5\r\nAccept-Encoding: gzip, deflate\r\nConnection: keep-alive\r\n\r\n

The code I use to send the request is the most basic socket operation (though I don't think the problem lies there, it works fine with most hosts)

socket_client.send(request_raw)

while socket_client.recv is used to get the response (but no problems here, the response is well-formed, though its status is 400).

Any ideas?

Upvotes: 0

Views: 1049

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1125138

When not talking to a proxy, you are not supposed to put the http://hostname part in the HTTP header; see section 5.1.2 of the HTTP 1.1 RFC 2616 spec:

The most common form of Request-URI is that used to identify a resource on an origin server or gateway. In this case the absolute path of the URI MUST be transmitted (see section 3.2.1, abs_path) as the Request-URI, and the network location of the URI (authority) MUST be transmitted in a Host header field.

(emphasis mine); abs_path is the absolute path part of the request URI, not the full absolute URI itself.

E.g. the server expects you to send:

GET / HTTP/1.1
Host: www.baltour.it

A receiving server should be tolerant of the incorrect behaviour, however. The server seems to violate the RFC as well here too. Further on in the same section it reads:

To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.

Upvotes: 1

Related Questions