Java Deflater for large set of random strings

Question

I am using the Deflater class to try to compress a large set of random strings. My compression and decompression methods look like this:

public static String compressAndEncodeBase64(String text) {
        try {
            ByteArrayOutputStream os = new ByteArrayOutputStream();
            try (DeflaterOutputStream dos = new DeflaterOutputStream(os)) {
                dos.write(text.getBytes());
            }
            byte[] bytes = os.toByteArray();

            return new String(Base64.getEncoder().encode(bytes));
        } catch (Exception e){
            log.info("Caught exception when trying to compress {}: ", text, e);
        }
        return null;
    }

public static String decompressB64(String compressedAndEncodedText) {
    try {
        byte[] decodedText = Base64.getDecoder().decode(compressedAndEncodedText);

        ByteArrayOutputStream os = new ByteArrayOutputStream();
        try (OutputStream ios = new InflaterOutputStream(os)) {
            ios.write(decodedText);
        }
        byte[] decompressedBArray = os.toByteArray();
        return new String(decompressedBArray, StandardCharsets.UTF_8);
    } catch (Exception e){
        log.error("Caught following exception when trying to decode and decompress text {}: ", compressedAndEncodedText, e);
        throw new BadRequestException(Constants.ErrorMessages.COMPRESSED_GROUPS_HEADER_ERROR);
    }
}

However, when I test this on a large set of random strings, my "compressed" string is larger than the original string. Even for a relatively small random string, the compressed data is longer. For example, this unit test fails:

@Test
    public void testCompressDecompressRandomString(){
        String orig = RandomStringUtils.random(71, true, true);
        String compressedString = compressAndEncodeBase64(orig.toString());
        Assertions.assertTrue((orig.toString().length() - compressedString.length()) > 0, "The decompressed string has length " + orig.toString().length() + ", while compressed string has length " + compressedString.length());
    }

Anyone can explain what's going on and a possible alternative?

Note: I tried using the deflater without the base64 encoding:

public static String compress(String data)  {
        Deflater new_deflater = new Deflater();
        new_deflater.setInput(data.getBytes(StandardCharsets.UTF_8));
        new_deflater.finish();
        byte compressed_string[] = new byte[1024];
        int compressed_size = new_deflater.deflate(compressed_string);
        byte[] returnValues = new byte[compressed_size];
        System.arraycopy(compressed_string, 0, returnValues, 0, compressed_size);
        log.info("The Original String: " + data + "
 Size: " + data.length());
        log.info("The Compressed String Output: " + new String(compressed_string) + "
 Size: " + compressed_size);
        return new String(returnValues, StandardCharsets.UTF_8);
    }

My test still fails however.

Java Deflater for large set of random strings

Answers (1)

Related Questions