Mugoma J. Okomba
Mugoma J. Okomba

Reputation: 3295

Getting a compressed version of web page

I am using HttpClient 4.1 to download a web page. I would like to get a compressed version:

    HttpGet request = new HttpGet(url);
    request.addHeader("Accept-Encoding", "gzip,deflate");

    HttpResponse response = httpClient.execute(request,localContext);
    HttpEntity entity = response.getEntity();

response.getFirstHeader("Content-Encoding") shows "Content-Encoding: gzip" however, entity.getContentEncoding() is null.

If I put:

entity = new GzipDecompressingEntity(entity);

I get:

java.io.IOException: Not in GZIP format

It looks like the resulting page is plain text and not compressed even though "Content-Encoding" header shows it's gzipped.

I have tried this on several URLs (from different websites) but get the same results.

How can I get a compressed version of a web page?

Upvotes: 1

Views: 241

Answers (1)

Denys Séguret
Denys Séguret

Reputation: 382150

Don't use HttpClient if you don't want your API to handle mundane things like unzipping.

You can use the basic URLConnection class to fetch the compressed stream, as demonstrated by the following code :

public static void main(String[] args) {
    try {
        URL url = new URL("http://code.jquery.com/jquery-latest.js");
        URLConnection con = url.openConnection();
        // comment next line if you want to have something readable in your console
        con.addRequestProperty("Accept-Encoding", "gzip,deflate");
        BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
        String l;
        while ((l=in.readLine())!=null) {
            System.out.println(l);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}

Upvotes: 1

Related Questions