Reputation: 437

GZIPInputStream decompression did not work fine for the compressed data with length more than 532 bytes

I have created compression and decompression using gZipInputStream in java It works fine for small amount of data but if the data length after compression becomes greater thatn 532 then my decompression does not work fine.

Thanks Bapi

Upvotes: 1

Answers (3)

mP.

Reputation: 18266

Looks like a char encoding/decoding problem to me. One should use Readers/Writers to write Strings, e.g. String.getBytes(). Using String(new byte[]) constructs are not the proper way..

You really should use a loop to read and check the returned bytes read value to ensure everything is read back!

Upvotes: 2

McDowell

Reputation: 108979

To reiterate what others have said:

It is often the case that str.length() != str.getBytes().length(). Many operating systems use a variable-length encoding (like UTF-8, UTF-16 or Windows-949).
Use OutputStream.close methods to ensure that all data is written correctly.
Use the return value of the InputStream.read to see how many bytes have been read. There is no guarantee that all data will be read in one go.
Be careful when using the String class for encoding/decoding.

String compression/decompression methods

  private static byte[] compress(String str, Charset charset) {
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();
    try {
      OutputStream deflater = new GZIPOutputStream(buffer);
      deflater.write(str.getBytes(charset));
      deflater.close();
    } catch (IOException e) {
      throw new IllegalStateException(e);
    }
    return buffer.toByteArray();
  }

  private static String decompress(byte[] data,
      Charset charset) {
    ByteArrayOutputStream buffer = new ByteArrayOutputStream();
    ByteArrayInputStream in = new ByteArrayInputStream(data);
    try {
      InputStream inflater = new GZIPInputStream(in);
      byte[] bbuf = new byte[256];
      while (true) {
        int r = inflater.read(bbuf);
        if (r < 0) {
          break;
        }
        buffer.write(bbuf, 0, r);
      }
    } catch (IOException e) {
      throw new IllegalStateException(e);
    }
    return new String(buffer.toByteArray(), charset);
  }

  public static void main(String[] args) throws IOException {
    StringBuilder sb = new StringBuilder();
    while (sb.length() < 10000) {
      sb.append("write the data here \u00A3");
    }
    String str = sb.toString();
    Charset utf8 = Charset.forName("UTF-8");
    byte[] compressed = compress(str, utf8);

    System.out.println("String len=" + str.length());
    System.out.println("Encoded len="
        + str.getBytes(utf8).length);
    System.out.println("Compressed len="
        + compressed.length);

    String decompressed = decompress(compressed, utf8);
    System.out.println(decompressed.equals(str));
  }

(Note that because these are in-memory streams, I am not being strict about how I open or close them.)

Upvotes: 5

Peter Lawrey

Reputation: 533930

I would suggest you use gCompress.close() not finish();

I also suggest that you cannot rely on str.length() being long enough to read. There is a risk the data could be longer and so the String will be truncated.

You also ignore the return value of read(). read() is only guaranteed to read() one byte and is unlikely to read exactly str.length() bytes of data, so you are likely to have lots of trailing nul bytes \0. Instead you could expect to read str.getBytes().length()

Upvotes: 2

GZIPInputStream decompression did not work fine for the compressed data with length more than 532 bytes

Answers (3)

Related Questions