Stefan
Stefan

Reputation: 2068

Parallel BZip2 Compression

i'm using Apache Commons Compress for Java to compress multiple log files to a single tar.bz2 archive.

However, it takes really long (> 12 hours) to compress, because i compress around 20GB of files a day.

As this library compresses files mono-threaded, i'd like to know if there is a way to do this multi-threaded.

I found many solutions (Commandline pbzip2 or some C++ libraries) but all i found for java is this blog post:

https://plus.google.com/117421466255362255970/posts/3jfKVu325zh

It seems that i can't use it in my Java application.

Is there anything out there? What would you recommend? Or is there another faster solution with similar compression rates like bzip2 ?

Upvotes: 4

Views: 3026

Answers (3)

af1n
af1n

Reputation: 440

Try at4j implementation of BZip2OutputStream. According to the manual it supports parallel compresion. http://at4j.sourceforge.net/releases/current/pg/ch04.xhtml

Upvotes: 0

reprogrammer
reprogrammer

Reputation: 14728

If a parallel implementation of bzip2 in Java doesn't exit, you can resort to invoking pbzip2 from within your Java application.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533920

As you have multiple files, you can compress each file in a different thread. As your process is CPU bound, I suggest creating a fixed size thread pool i.e. an ExecutorService, and adding a task for each file to compress.

Note: if pbzip2 does what you want, I would call it from Java. You might find it is fast for even one thread as the BZIP2 libraries I have seen for Java are natively implemented (unlike JAR, ZIP and GZIP)

Upvotes: 2

Related Questions