Extract gzip content from raw http response

Question

I try to do http(not https scheme, i.e url is http://www.example.com) get simply by socket module, then I recv response which contains all tranferred data from server(header and body with gzip encoded).Then I try to extract gzipped body content. I guess this content should start at \x1f\x8b\x08 , but I don't know where it should end.Any help?

Below is my raw response

HTTP/1.1 200 OK

Header Part



some_number_here

\x1f\x8b\x08 ......
......

0

regilero · Accepted Answer

I bet that in the Header part you have an Transfer-Encoding: chunked header.

This is an HTTP/1.1 response, not an HTTP/1.0, and understanding chunked transmission is required in the 1.1 version of HTTP.

You have two solutions:

tell the server you do not understand HTTP/1.1 by using HTTP/1.0 in your requests, on the first line, like in GET /foo HTTP/1.0
- implement the chunked transmission parsing.

The parsing is not so hard. Instead of a raw body you have a body splitted in parts (chunks); each part start with the chunk size (the some_number_here stuff), it's an hexadecimal number(warning 10 means 16, 1c means 28).

Then you have the raw chunk content.

Then the next chunk.

Until you reach the last chunk, which is advertized with a 0 size (0).

Warning: the server may take some time between chunks, you have to keep reading the socket until you see this last chunk.

PS: do not try to implement HTTP with sockets for something that would go into production later, there are a lot of HTTP clients available, even in python, and it's a very huge job to get something secure and robust.

Extract gzip content from raw http response

Answers (1)

Related Questions