Reputation: 15378
Can't understand what is wrong. ioutil.ReadAll should use gzip as for other URLs.
Can reproduce with URL: romboutskorea.co.kr
Error:
gzip: invalid header
Code:
resp, err := http.Get("http://" + url)
if err == nil {
defer resp.Body.Close()
if resp.StatusCode == http.StatusOK {
fmt.Printf("HTTP Response Status : %v\n", resp.StatusCode)
bodyBytes, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Printf("HTTP Response Read error. Url: %v\n", url)
log.Fatal(err)
}
bodyString := string(bodyBytes)
fmt.Printf("HTTP Response Content Length : %v\n", len(bodyString))
}
}
Upvotes: 2
Views: 7736
Reputation: 597
I had a similar issue, but I was dealing with a "hand-crafted" PHP script response which did something like this:
header('Content-Encoding: gzip');
echo @gzcompress($return);
I was trying to read the response from GO with:
gzip.NewReader(resp.Body)
But I should be doing:
zlib.NewReader(resp.Body)
From gzcompress PHP docs:
https://www.php.net/manual/en/function.gzcompress.php
'This function compresses the given string using the ZLIB data format.'
'This is not the same as gzip compression, which includes some header data. See gzencode() for gzip compression.'
Upvotes: 0
Reputation: 123280
The response of this site is wrong. It is claiming gzip encoding but it does not actually compress the content. The response looks something like this:
HTTP/1.1 200 OK
...
Content-Encoding: gzip
...
Transfer-Encoding: chunked
Content-Type: text/html; charset=euc-kr
8000
<html>
<head>
...
The "8000" comes from the chunked transfer encoding but the "..." is the beginning of the unchunked response body. Obviously this is not compressed even though it is claimed so.
It looks like browsers simply work around this broken site by ignoring the wrong encoding specification. Browsers actually work around lot of broken stuff which does not really add motivation for the providers to fix these issues :( But you can see that curl
will fail to:
$ curl -v --compressed http://romboutskorea.co.kr/main/index.php?
...
< HTTP/1.1 200 OK
< ...
< Content-Encoding: gzip
< ...
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=euc-kr
<
* Error while processing content unencoding: invalid code lengths set
* Failed writing data
* Curl_http_done: called premature == 1
* Closing connection 0
curl: (23) Error while processing content unencoding: invalid code lengths set
And so does Python:
$ python3 -c 'import requests; requests.get("http://romboutskorea.co.kr/main/index.php?")'
...
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))
Upvotes: 3
Reputation: 1323953
I see
Content-Type: text/html; charset=euc-kr
Content-Encoding: gzip
Check the Body content: as in here, it could be an HTTP response where the body is first compressed with gzip and then encoded with chunked transfer encoding.
An NewChunkedReader
would be needed, as in this example.
Upvotes: 1