Reputation: 2068
i'm using Apache Commons Compress for Java to compress multiple log files to a single tar.bz2
archive.
However, it takes really long (> 12 hours) to compress, because i compress around 20GB of files a day.
As this library compresses files mono-threaded, i'd like to know if there is a way to do this multi-threaded.
I found many solutions (Commandline pbzip2 or some C++ libraries) but all i found for java is this blog post:
https://plus.google.com/117421466255362255970/posts/3jfKVu325zh
It seems that i can't use it in my Java application.
Is there anything out there? What would you recommend? Or is there another faster solution with similar compression rates like bzip2 ?
Upvotes: 4
Views: 3026
Reputation: 440
Try at4j implementation of BZip2OutputStream. According to the manual it supports parallel compresion. http://at4j.sourceforge.net/releases/current/pg/ch04.xhtml
Upvotes: 0
Reputation: 14728
If a parallel implementation of bzip2 in Java doesn't exit, you can resort to invoking pbzip2 from within your Java application.
Upvotes: 1
Reputation: 533920
As you have multiple files, you can compress each file in a different thread. As your process is CPU bound, I suggest creating a fixed size thread pool i.e. an ExecutorService, and adding a task for each file to compress.
Note: if pbzip2 does what you want, I would call it from Java. You might find it is fast for even one thread as the BZIP2 libraries I have seen for Java are natively implemented (unlike JAR, ZIP and GZIP)
Upvotes: 2