Jason  Long
Jason Long

Reputation: 79

i want to compress the file by the deflater but it seems doesn't work

i have to do some precompress before the file upload, and i want to use the deflaterOutputStream to compress the file but it doesn't work, if i have a file about 10MB when i compress it on the outputstream it still 10MB, and i don't know why, and i also want to know that how to exchange the outputstream to a file with this method that compare the OutputStream size with the fileSize to judge the whether it compress or not. Thanks a lot.

public class test {
    public static int m_lvl;
    public static void main(String[] args) {
        try {
            Path file = new File("a.jpg").toPath();
            ByteArrayOutputStream bos = new ByteArrayOutputStream((int) new File("a.png").toPath().toFile().length());
            OutputStream bo = new DeflaterOutputStream( bos, new Deflater( m_lvl, true), 512 );
            Files.copy(file, bos);
            System.out.println((int) new File("a.zip").toPath().toFile().length());
            System.out.println(bos.size());

            bo.close();
            bos.close();
        }  catch (IOException e) {
            e.printStackTrace();
        } 

    }
}

Upvotes: 1

Views: 595

Answers (1)

João Rebelo
João Rebelo

Reputation: 79

From the question it looks like you are trying to compress any file.

There are a few concepts which you should try to understand and have nothing to do with Java itself, but with data compression and reconstruction.

There are at two types (as my limited knowledge knows :)) of algorithms for data compression/decompression:

  1. Lossless algorithms, for example: ZIP
  2. Lossy algorithms, for example JPEG

Without entering into much detail, what you want to do is pick some Raw Data, compress it (so it occupies less space) and send it over to someone.

From the list of algorithms above, you may pick one or the other depending on your needs. For example, you will perhaps pick a Lossless algorithm if you require that no information is lost when the data is compress (imagine a .docx file or any other document such as the binary information of a computer program).

Otherwise, you might pick a lossy algorithm if, for example, you're displaying an image to another human user and don't mind some information being lost because the human eye can't see it all.

Now, the problem resides in the fact that what you're now trying to do is compress an already compressed file.

Imagine the following:

  1. You have a text file filled with 64kb of the letter "a" or "a" repeated 64kb times (using 64kb of disk space).

  2. When you want to compress it using a lossless algorithm you might get a compressed file with something "65536a" in a single line using perhaps 6bytes of disk space, meaning a repeated 64kb times in the original file.

  3. Imagine that for some reason you don't need to know how many times a is repeated - lossy compression, and you use a lossy algorithm to do that and get a single file with a line which reads "a" using perhaps 1 byte of disk space, meaning a appears in the original file.

  4. Now, if for some reason you want a lossless compressed file of the lossy compressed file, using my reasoning, you might get something like "1a" which now occupies 2 bytes of disk space, ence it is a bigger file.

That is basically what you're trying to do. Compress an already compressed file, and, depending on the algorithm in question, you might get a smaller file or not. Most probably, when compressing from lossy to lossless you might get a bigger file as in my example.

As I have tried to resume. Compression depends when whether or not you have redundant information to be expressed in shorter ways.

EDIT 2: Refactoring the response in order to clarify a little bit more. EDIT 3: Found out a type on my listing of algorithms

Upvotes: 3

Related Questions