Gehan
Gehan

Reputation: 666

Is there a better way of achieving a worthwhile amount of compression on a Java collection object?

I am currently looking at different alternatives to improve the performance of a search operations for a existing web application. I am attempting to figure out what would be the available maximum improvement possible for the existing system with compression before looking at different alternatives.

In the existing system a result set returned in response to a user search is formulated using internal as well as external data resources. The result set is made of of nested Java collection objects. I would like to compress and transfer objects and decompress them as and when required. The data we want to compress is quite varied, from float vectors to strings to dates.

I have tried out a Java utility to compress and expand a collection object. I tried out the below code block in an attempt to check how Java compression would help reduce the the result set size and if it would improve data transfer over the network. I have used the Gzip based compression.

package com.soft.java.Objectcompress;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;
import java.io.InputStream;
import java.io.OutputStream;

/**
 * 
 * The Class ObjectCompressionUtil.
 * 
 * @param <T> the generic type of the serializable object to be compressed
 */
public class ObjectCompressionUtil<T extends Serializable> {

    /**
     * The compressObject(final T objectToCompress) takes the object 
     * to compress and returns the compressed object as byte array.
     * 
     * @param objectToCompress the object to compress
     * @return the compressed object as byte array
     * @throws IOException Signals that an I/O exception has occurred.
     */
    public byte[] compressObject(final T objectToCompress) throws IOException {

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        /*Create a new GZIPOutputStream with a default buffer size.*/
        final GZIPOutputStream gz = new GZIPOutputStream(baos);
        /*Create an ObjectOutputStream that writes to the specified GZIPOutputStream.*/ 
        final ObjectOutputStream oos = new ObjectOutputStream(gz);

        try {
          /*Writes the specified object to be compressed to the ObjectOutputStream and flush it, using writeObject(Object obj)*/
            oos.writeObject(objectToCompress);
          /*flush() API methods of ObjectOutputStream*/
            oos.flush();
        }
        catch (Exception e) {
            e.printStackTrace();
        }
        /*Closes both the GZIPOutputStream and the ObjectOutputStream, using their close() API methods.*/
        finally {
            oos.close();
        }

        byte[] bytes = baos.toByteArray();

        return bytes;
    }

    /**
     * The expandObject(final T objectToExpand, final InputStream instream) method takes 
     * the object to expand and an InputStream and returns the expanded object.
     * 
     * @param objectToExpand the object to expand
     * @param instream the input stream
     * @return the expanded object
     * @throws IOException Signals that an I/O exception has occurred.
     * @throws ClassNotFoundException the class not found exception
     */
    public T expandObject(byte[] objectToExpand) throws IOException,ClassNotFoundException {
        ByteArrayInputStream bais = new ByteArrayInputStream(objectToExpand);
      /*Creates a new GZIPInputStream with a default buffer size.*/
        final GZIPInputStream gs = new GZIPInputStream(bais);
      /*Creates an ObjectInputStream that reads from the specified GZIPInputStream.*/
        final ObjectInputStream ois = new ObjectInputStream(gs);

        /*Reads the object to expand from the ObjectInputStream, with readObject() API method of ObjectInputStream.*/
        try {
            @SuppressWarnings("unchecked")
            T expandedObject = (T) ois.readObject();
            //MyObject myObj1 = (MyObject) objectIn.readObject();

            /*Returns the expanded object*/
            return expandedObject;
        } finally {
            /*Closes both the GZIPInputStream and the ObjectInputStream, using their close() API methods.*/
            gs.close();
            ois.close();
            bais.close();
        }
    }
}

I also checked for similar problems on this forum and there were a few but did not explicitly answer my question.so I thought of posting this question.

Would there be any better way of achieving a worthwhile amount of compression of the result set ? I am looking at ease of compression and speed of decompression as the most important factor and best compression ratio as second preference.

Would the type/combination of streams used have an effect of the expected outcome ?

Are there any other custom / third party compression algorithms that offer far better performance improvements?

Update - Some possible leads related issue

Upvotes: 1

Views: 1368

Answers (2)

Thomas Mueller
Thomas Mueller

Reputation: 50087

I would first analyze

  • how does the data look like (do you have large objects, small objects, repeated objects,...),
  • the time budget you have (for compression and decompression),
  • how much you gain by compression,
  • how much you can compress the data.

And only then decide which approach to use. You wrote:

The result set is made of of nested Java collection objects. ... The data we want to compress is quite varied, from float vectors to strings to dates.

You could try to compress individual items (for example using the approach you presented). It's probably only worth it for large objects, or if you have lots of CPU cycles to spare.

For repeated, immutable objects, such as Strings, you could use a simple "re-use existing equal objects" cache, such as this one, or simply use String.intern() (but that one has many disadvantages).

Upvotes: 2

Mark Adler
Mark Adler

Reputation: 112189

You can look at LZ4 for not-as-good compression, but much faster decompression than gzip.

Upvotes: 1

Related Questions