Soni Sahai
Soni Sahai

Reputation: 13

zipping the file through java in an efficient way

I am generating a file of size 1 GB, now I have to zip this file through java itself.

FileOutputStream fileOutput = new FileOutputStream(
                        file);

                BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(
                        fileOutput));

                addContent(abc, def, bw);

                bw.close(); 
                fileOutput.close();

Please advise as I am looking to made a customized method which will accept a file name and file path as an argument and will zip the file, something like below:

        public void generatezipforafie( folderpath, filename)
        {

        //please advise the logic to zip the file

        }

Upvotes: 0

Views: 275

Answers (1)

jboi
jboi

Reputation: 11912

I'm assuming that your term 'efficient' means as fast as possible. You can either use GZIPOutpuStream to zip one large file or ZipOutputStream to zip numbers of files and concatenate them together into one zip library. Both is explained well in the standard javadocs.

To keep a long story short, to be efficient - to use more CPU simultaneously- devide your large file into blocks, let different threads zip them simultaneously and concatenate the outputs. On the receiving side just do the same vice versa.

The one downside with the standard zip classes is that both work single threaded on just one CPU/core. So they might not be efficient in your terms. This is because the zipping algorithm itself is single threaded. Existing parallelized versions take blocks of data and zip them in different threads. Then they have corresponding logic for unzipping. You will find tons of material about this by searching for PIGZ on the net.

EDIT according to comment from @VictorSeifert

The compression ratio mainly depends on three things: Your data (obviously), the deepness of the compression and the block size. Compression deepness can be controlled in the java classes using setLevel(). The block size can be chosen freely. The larger the block the better the compression but the less parallelism can be achieved.

PIGZ for example uses 128 kb blocks by default and maintains a 32kb dictionary so that compression gets better from block to block. I myself made good results with 1mb blocks and no dictionary. The dictionary adds a lot complexity to the threading model and my problems just were not big enough so far to get into this.

Upvotes: 1

Related Questions