Reputation: 849
I try to do http(not https scheme, i.e url is http://www.example.com
) get
simply by socket
module, then I recv
response which contains all tranferred data from server(header and body with gzip encoded).Then I try to extract gzipped body content. I guess this content should start at \x1f\x8b\x08
, but I don't know where it should end.Any help?
Below is my raw response
HTTP/1.1 200 OK\r\n
Header Part\r\n
\r\n
some_number_here\r\n
\x1f\x8b\x08 ......
......\r\n
0\r\n
\r\n
Upvotes: 2
Views: 548
Reputation: 30496
I bet that in the Header part you have an Transfer-Encoding: chunked
header.
This is an HTTP/1.1
response, not an HTTP/1.0
, and understanding chunked transmission is required in the 1.1 version of HTTP.
You have two solutions:
HTTP/1.1
by using HTTP/1.0
in your requests, on the first line, like in GET /foo HTTP/1.0
The parsing is not so hard. Instead of a raw body you have a body splitted in parts (chunks); each part start with the chunk size (the some_number_here\r\n
stuff), it's an hexadecimal number(warning 10
means 16
, 1c
means 28).
Then you have the raw chunk content.
Then the next chunk.
Until you reach the last chunk, which is advertized with a 0 size (0\r\n\r\n
).
Warning: the server may take some time between chunks, you have to keep reading the socket until you see this last chunk.
PS: do not try to implement HTTP with sockets for something that would go into production later, there are a lot of HTTP clients available, even in python, and it's a very huge job to get something secure and robust.
Upvotes: 1