Exact copy of http gzipped response into a string

Question

I need help.

I'm trying to get content of website where Content-encoding is gzip, with dmd v2.066.1 on Windows. This is my web address for test: "http://diaboli.pl/test2.html".

My HTTP request is:

GET /test2.html HTTP/1.1
Host: diaboli.pl
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: pl,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
User-Agent: My Browser
Referer: http://google.pl
DNT: 1

The server response is:

HTTP/1.1 200 OK
Date: Sat, 24 Jan 2015 23:02:00 GMT
Server: Apache
Last-Modified: Sat, 24 Jan 2015 22:48:44 GMT
ETag: "5c468ad-83f-50d6db511eb00"
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 942
Content-Type: text/html

.)┘R!SĽ╣ň┌KRB:éş^»{█ĺ.ç}aOě_DźŢ░▼'dĘ$ëĚk\|j\pý§Ěí▀k║Ź■ß♠┐}ú2žŢ  ´dĹĺńMłÎ▒└╚‼/§B⌂Ĺ▬°'˘uŕNá☺■█Ór↓m(┘đ▬Ţ┼ńĺ╦⌂
§gŰůqýä╗˘%p▬■&B♂M]§Üú3ý^ý-ÎD`x!Ő╔&M♥~╣y╬uşëňZ@▒]˘ä2}Ś╣xdÄyWüm§?ąě░Äd4,d‼î-▬
┬♣Bön°6{őu└♀☺█UĂ└,aF˘├☼☻OŔ˛mţË▄▀Čó¸ö31ÎňEÖKŮţĄîÔŐ←ôň¸HÉ┌bŤ}Dnń'ń9┌
Îă♠¶U♣VI^▲hËőŃ└_zďĆ6┬6█¨}{╝╦ÄřeđŠoŤčů¤űU´öěŁ*ŠxĂ☻(,─AôlZ»Ú^ßćş¸ő╬↓M`¬PË═qí¨Ýç▼7╣§y♫Ěz╣┴âž7uř┐ `$SřítR¶╗u ź☻‼ĘXçf☺°NH▄˛☻ şp─RĄ►¬w╬\758GN║K)     ;ĺ\ÝŇľ♫╩┼╬|ABYÍţ∟═Yů+╔y?ťkVĐ┼
nş║☼jv¶ĐSô9Dů♠▓Ç˙üK╬2\˝d[☼ <ľ┘Ń↓ü╠âG ˇ¸
ľyŇđd■ß▲e☼Â¸♣e_ÂśúQ÷śń,ÖĹ¬[N╝b┼Ř└ŕ↓ÚcS┴3╗╠w▀[ş↕ĺŽCňđś↕⌂═őç˛ţHW∟d=╩║Y►│Ô]sČšX§_ˇ↔ĹCČŤI┬y┤ŕ▲╬Ő↕╩§┌}í m\∟Öç#♣§╝×░♂öĄT`=BÂ|5mˇ|Ňs)ŐRĹ═▒é┴\yru▬ć=Rďĺ]↔ŰýÉĆ☼─ć↑¬pZÇ▓9PC§ę4 ×@ş Ź☺╬ňLj█Á¨uĄ:│§Bšš∟ďŃ?▼nvO!0↔}î*╠aŢ ţh
Ľ*7Îĺ$vn ŔIŘM¸♀˙¶ÎŞŞb⌂♫äý"´♂çK}⌂Y♀ ♣XŽëM

As You can see, it's a gzip encoded content. The server response is printed out the cmd console with the write() function, character by character. The problem is, I can't make the exact copy of the response string. If I try, I got this result:

HTTP/1.1 200 OK
Date: Sat, 24 Jan 2015 23:02:00 GMT
Server: Apache
Last-Modified: Sat, 24 Jan 2015 22:48:44 GMT
ETag: "5c468ad-83f-50d6db511eb00"
Accept-Ranges: bytes
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 942
Content-Type: text/html

▼ő

I can determine the length of the content, and it is equal as the HTTP Content-Length header value, but I can see, it's not the same string as the one by one original.

It's also interesting, that I can decompress that bad content string with zlib uncompress() function, and it doesn't return the zlib data error, but the cutted decompressed content. Ofcourse, the browsers like FF or IE displays the complete decompressed content without problems.

I'm connecting to the server like this:

import std.stdio, import std.string, std.conv, std.socket, std.stream, std.socketstream, std.zlib;

ushort port=80; string domain="diaboli.pl"; 
string request_uri; int[] pos; string request; string buffer; string znak; string line; 
int contentlength=-1; int[] postab; string bodybuffer; string headerbuffer; int readingbody=0; 
std.zlib.UnCompress u; const(void)[] udata;

Socket sock = new TcpSocket(new InternetAddress(domain, port));
Stream ss   = new SocketStream(sock);

request="GET " ~ request_uri ~ " HTTP/1.1
";
request~="Host: " ~ domain ~ "
";
request~="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
";
request~="Accept-Language: pl,en-US;q=0.7,en;q=0.3
";
request~="Accept-Encoding: gzip, deflate
";
request~="User-Agent: My Browser
";
request~="Referer: http://google.pl
";
request~="DNT: 1
";
request~="
";

writeln("HTTP request:
---");
writeln(request);
writeln("---");

ss.writeString(request);

writeln("
All response from the server character by character:
---");
line="";
while (1)
{
    if (readingbody==1) readingbody=2; //the way to separate headers and the content - first part.

    znak = to!string(ss.getc());
    if (ss.eof()) break;
    line~=znak;
    //if (readingbody==2) 
    write(znak);

    if (znak=="
")
    {
        if (strpos(line,"Content-Length: ")>-1) 
        {
            postab ~= strpos(line,"
");
            postab ~= strpos(line,"
");
            contentlength=to!int(substr(line,16,postab.sort[0]-16));
        }

        if (readingbody==0 && line=="
") readingbody=1;
        line="";
    }

    buffer ~= znak;

    //the way to separate headers and the content - second part.
    if (readingbody==0 && line=="
") readingbody=1;
    if (readingbody==2) bodybuffer ~= znak;
    else headerbuffer ~= znak;
}

sock.close();

writeln("
---");

write("Content-Length="); writeln(contentlength); //This is the Content-Length determined from the HTTP Content-Length header.
write("bodybuffer.length="); writeln(bodybuffer.length); //This the length of the content string

writeln("
All response copied into the string:
---");
writeln(buffer);

writeln("---
Only content:
---");
writeln(bodybuffer);

writeln("---
Uncompressed:
---");
u = new UnCompress(HeaderFormat.determineFromData);
udata = u.uncompress(bodybuffer);
writeln(cast(string)udata);

//These are my simple text processing functions similar to php.
int strpos(string str,string tofind,int caseinsensitive=0)
{
    int pos=-1;
    if (caseinsensitive==1)
    {
        str=toUpper(str);
        tofind=toUpper(tofind);
    }
    if (str.length>=tofind.length)
    {
        for(int i=0;istr.length) break;
            if (str[i..i+tofind.length]==tofind) 
            {
                pos=i;
                break;
            }
        }
    }
    return pos;
}

string substr(string str,int pos, int offset)
{
    string substring="";
    if (str.length>0 && pos>-1 && offset>0)
    {
        substring=str[pos..pos+offset];
    }
    return substring;
}

Vladimir Panteleev · Accepted Answer

There are three problems with your code:

You use Stream.getc, which does newline conversions. This will corrupt binary data. You can fix this by replacing:
```
znak = to!string(ss.getc());
```
with:
```
char c; ss.readBlock(&c, 1); znak = to!string(c);
```
Although it's better to avoid std.stream entirely, it is ancient code waiting to be replaced.
You specify a HTTP version of 1.1, so the server sends back the conent with Transfer-Encoding: chunked. Your program cannot handle this transfer encoding. You can change the protocol version to 1.0.
When using the std.zlib classes, you must call flush after piping through all the data. Add this line:
```
udata ~= u.flush();
```

With these changes, your program works fine for me.

Exact copy of http gzipped response into a string

Answers (1)

Related Questions