Desolator
Desolator

Reputation: 22769

okhttp 3: how to decompress gzip/deflate response manually using Java/Android

I know that okhttp3 library by default it adds the header Accept-Encoding: gzip and decodes the response automatically for us.

The problem I'm dealing with a host that only accepts a header like: Accept-Encoding: gzip, deflate if I don't add the deflate part it fails. Now when I manually add that header to okhttp client, the library doesn't do the decompression anymore for me.

I've tried multiple solutions to take the response and try to manually decompress that but I've always ended up with an exception i.e. java.util.zip.ZipException: Not in GZIP format, here's what I've tried so far:

//decompresser
public static String decompressGZIP(InputStream inputStream) throws IOException
{
    InputStream bodyStream = new GZIPInputStream(inputStream);
    ByteArrayOutputStream outStream = new ByteArrayOutputStream();
    byte[] buffer = new byte[4096];
    int length;
    while ((length = bodyStream.read(buffer)) > 0) 
    {
        outStream.write(buffer, 0, length);
    }

    return new String(outStream.toByteArray());
}


//run scraper
scrape(api, new Callback()
{
    // Something went wrong
    @Override
    public void onFailure(@NonNull Call call, @NonNull IOException e)
    {
    }

    @Override
    public void onResponse(@NonNull Call call, @NonNull Response response) throws IOException
    {
        if (response.isSuccessful())
        {
            try
            {
                InputStream responseBodyBytes = responseBody.byteStream();
                returnedObject = GZIPCompression.decompress(responseBodyBytes);

                if (returnedObject != null)
                {
                    String htmlResponse = returnedObject.toString();
                }
            }
            catch (ProtocolException e){}

            if(response != null) response.close();
        }
    }
});



private Call scrape(Map<?, ?> api, Callback callback)
{
    MediaType JSON = MediaType.parse("application/json; charset=utf-8");
    String method = (String) api.get("method");
    String url = (String) api.get("url");
    Request.Builder requestBuilder = new Request.Builder().url(url);
    RequestBody requestBody;

    requestBuilder.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0");
    requestBuilder.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
    requestBuilder.header("Accept-Language", "en-US,en;q=0.5");
    requestBuilder.header("Accept-Encoding", "gzip, deflate");
    requestBuilder.header("Connection", "keep-alive");
    requestBuilder.header("Upgrade-Insecure-Requests", "1");
    requestBuilder.header("Cache-Control", "max-age=0");

    Request request = requestBuilder.build();

    Call call = client.newCall(request);
    call.enqueue(callback);

    return call;
}

Just a note, the response headers will always return Content-Encoding: gzip and Transfer-Encoding: chunked

One more thing, I've also tried the solution in this topic and it still fails with D/OkHttp: java.io.IOException: ID1ID2: actual 0x00003c68 != expected 0x00001f8b.

Any help would be appreciated..

Upvotes: 10

Views: 16930

Answers (5)

Richard Maw
Richard Maw

Reputation: 41

I had to implement this myself recently and found that existing answers had a few errors, so here's my take with how it works today.

import java.util.Collections;
import java.util.zip.Inflater;
import okhttp3.Headers;
import okhttp3.Interceptor;
import okhttp3.MediaType;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;
import okhttp3.ResponseBody;
import okio.BufferedSource;
import okio.GzipSource;
import okio.InflaterSource;
import okio.Okio;

var client = new OkHttpClient.Builder()
    .addInterceptor(
        (Interceptor.Chain chain) -> {
          var oldRequest = chain.request();

          // If the caller has passed their own Accept-Encoding
          // it's indicating they expect to handle it themself.
          if (oldRequest.header("Accept-Encoding") != null) {
            return chain.proceed(oldRequest);
          }

          // Augment request saying we accept multiple content encodings
          var newHeaders =
              oldRequest
                  .headers()
                  .newBuilder()
                  .add("Accept-Encoding", "deflate")
                  .add("Accept-Encoding", "gzip")
                  .build();

          var newRequest = oldRequest.newBuilder().headers(newHeaders).build();

          var oldResponse = chain.proceed(newRequest);

          // Replace the response's request with the original one
          var responseBuilder = oldResponse.newBuilder().request(oldRequest);

          // We might not have a body to decompress
          var body = oldResponse.body();
          if (body != null) {
            BufferedSource source = body.source();
            // The body may have been wrapped in an arbitrary encoding sequence
            // and the server returns them in the order it encoded them
            // so we wrap them with decoders in reverse order.
            var encodings = oldResponse.headers().values("Content-Encoding");
            Collections.reverse(encodings);
            for (var encoding : encodings) {
              if ("deflate".equalsIgnoreCase(encoding)) {
                var inflater = new Inflater(true);
                source = Okio.buffer(new InflaterSource(source, inflater));
              } else if ("gzip".equalsIgnoreCase(encoding)) {
                source = Okio.buffer(new GzipSource(source));
              }
            }

            // Strip encoding and length headers as we've already handled them
            var strippedHeaders =
                oldResponse
                    .headers()
                    .newBuilder()
                    .removeAll("Content-Encoding")
                    .removeAll("Content-Length")
                    .build();
            responseBuilder.headers(strippedHeaders);
            var contentType = MediaType.parse(oldResponse.header("Content-Type"));
            // Construct a new body with an inferred Content-Length
            var newBody = ResponseBody.create(contentType, -1L, source);
            responseBuilder.body(newBody);
          }

          return responseBuilder.build();
        })
    .build();

Upvotes: 0

shengbin_xu
shengbin_xu

Reputation: 168

Thank you for Aksenov Vladimir`s reply. Your answer saved me a lot of time. Everything is working fine after I upgraded okhttp from 3.x to 4.11.

Here are some additional details:

  1. When users explicitly include the "Accept-Encoding: gzip" header, they need to handle the decompression of the response content themselves.
  2. When users do not explicitly specify "Accept-Encoding" and "Range" okhttp will automatically add "Accept-Encoding: gzip" to the request header, and automatically decompress the response content (if "Content-Encoding" is gzip).

The relevant code is as follows: okhttp3.internal.http.BridgeInterceptor

// If we add an "Accept-Encoding: gzip" header field we're responsible for also decompressing
    // the transfer stream.
    var transparentGzip = false
    if (userRequest.header("Accept-Encoding") == null && userRequest.header("Range") == null) {
      transparentGzip = true
      requestBuilder.header("Accept-Encoding", "gzip")
    }

if (transparentGzip &&
        "gzip".equals(networkResponse.header("Content-Encoding"), ignoreCase = true) &&
        networkResponse.promisesBody()) {
      val responseBody = networkResponse.body
      if (responseBody != null) {
        val gzipSource = GzipSource(responseBody.source())
        val strippedHeaders = networkResponse.headers.newBuilder()
            .removeAll("Content-Encoding")
            .removeAll("Content-Length")
            .build()
        responseBuilder.headers(strippedHeaders)
        val contentType = networkResponse.header("Content-Type")
        responseBuilder.body(RealResponseBody(contentType, -1L, gzipSource.buffer()))
      }
    }

Upvotes: 1

fox z
fox z

Reputation: 1

Because okhttp does not support deflate

in BridgeInterceptor.java or BridgeInterceptor.kt

    if (transparentGzip &&
    "gzip".equals(networkResponse.header("Content-Encoding"), ignoreCase = true) &&
    networkResponse.promisesBody()) {

Upvotes: 0

Aksenov Vladimir
Aksenov Vladimir

Reputation: 707

Version 4.10.0 can already do it automatically if your header contains gzip

Upvotes: 1

Desolator
Desolator

Reputation: 22769

After 6 hours of digging I found the correct solution and as usual it was easier than I thought, so I was basically trying to decompress a page that's not gzipped for that reason it was failing. Now once I hit the second page (which is compressed) I get a gzipped response where the code above should handle it. Also if anyone wants the solution I used a modified interceptor just like the one in this answer so you don't need to use a custom function to handle the decompression.

I modified the unzip method to make the okhttp interceptor work with compressed and uncompressed responses:

OkHttpClient.Builder clientBuilder = new OkHttpClient.Builder().addInterceptor(new UnzippingInterceptor());
OkHttpClient client = clientBuilder.build();

And the Interceptor is like dis:

private class UnzippingInterceptor implements Interceptor {
    @Override
    public Response intercept(Chain chain) throws IOException {
        Response response = chain.proceed(chain.request());
        return unzip(response);
    }
  

// copied from okhttp3.internal.http.HttpEngine (because is private)
private Response unzip(final Response response) throws IOException {
    if (response.body() == null)
    {
        return response;
    }
    
    //check if we have gzip response
    String contentEncoding = response.headers().get("Content-Encoding");
    
    //this is used to decompress gzipped responses
    if (contentEncoding != null && contentEncoding.equals("gzip"))
    {
        Long contentLength = response.body().contentLength();
        GzipSource responseBody = new GzipSource(response.body().source());
        Headers strippedHeaders = response.headers().newBuilder().build();
        return response.newBuilder().headers(strippedHeaders)
                .body(new RealResponseBody(response.body().contentType().toString(), contentLength, Okio.buffer(responseBody)))
                .build();
    }
    else
    {
        return response;
    }
}
}

Upvotes: 25

Related Questions