Mediator
Mediator

Reputation: 15378

HTTP response throw error gzip: invalid header

Can't understand what is wrong. ioutil.ReadAll should use gzip as for other URLs.

Can reproduce with URL: romboutskorea.co.kr

Error:

gzip: invalid header

Code:

resp, err := http.Get("http://" + url)
            if err == nil {
                defer resp.Body.Close()

                if resp.StatusCode == http.StatusOK {
                    fmt.Printf("HTTP Response Status : %v\n", resp.StatusCode)
                    bodyBytes, err := ioutil.ReadAll(resp.Body)
                    if err != nil {
                        fmt.Printf("HTTP Response Read error. Url: %v\n", url)
                        log.Fatal(err)
                    }
                    bodyString := string(bodyBytes)

                    fmt.Printf("HTTP Response Content Length : %v\n", len(bodyString))
                }
            }

Upvotes: 2

Views: 7736

Answers (3)

Matias
Matias

Reputation: 597

I had a similar issue, but I was dealing with a "hand-crafted" PHP script response which did something like this:

header('Content-Encoding: gzip');
echo @gzcompress($return);

I was trying to read the response from GO with:

gzip.NewReader(resp.Body)

But I should be doing:

zlib.NewReader(resp.Body)

From gzcompress PHP docs:

https://www.php.net/manual/en/function.gzcompress.php

'This function compresses the given string using the ZLIB data format.'
'This is not the same as gzip compression, which includes some header data. See gzencode() for gzip compression.'

Upvotes: 0

Steffen Ullrich
Steffen Ullrich

Reputation: 123280

The response of this site is wrong. It is claiming gzip encoding but it does not actually compress the content. The response looks something like this:

HTTP/1.1 200 OK
...
Content-Encoding: gzip
...
Transfer-Encoding: chunked
Content-Type: text/html; charset=euc-kr

8000
<html>
<head>
...

The "8000" comes from the chunked transfer encoding but the "..." is the beginning of the unchunked response body. Obviously this is not compressed even though it is claimed so.

It looks like browsers simply work around this broken site by ignoring the wrong encoding specification. Browsers actually work around lot of broken stuff which does not really add motivation for the providers to fix these issues :( But you can see that curl will fail to:

$ curl -v --compressed http://romboutskorea.co.kr/main/index.php?
...
< HTTP/1.1 200 OK
< ...
< Content-Encoding: gzip
< ...
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=euc-kr
< 
* Error while processing content unencoding: invalid code lengths set
* Failed writing data
* Curl_http_done: called premature == 1
* Closing connection 0
curl: (23) Error while processing content unencoding: invalid code lengths set

And so does Python:

$ python3 -c 'import requests; requests.get("http://romboutskorea.co.kr/main/index.php?")'
...
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

Upvotes: 3

VonC
VonC

Reputation: 1323953

I see

Content-Type: text/html; charset=euc-kr
Content-Encoding: gzip

Check the Body content: as in here, it could be an HTTP response where the body is first compressed with gzip and then encoded with chunked transfer encoding.

An NewChunkedReader would be needed, as in this example.

Upvotes: 1

Related Questions