osgx
osgx

Reputation: 94485

What is the fastest bzip2 decompressor?

Which implementation of bzip2 have the biggest decompression speed?

There is a http://bitbucket.org/james_taylor/seek-bzip2/src/tip/micro-bunzip.c which claims

Size and speed optimizations by Manuel Novoa III ([email protected]). More efficient reading of huffman codes, a streamlined read_bunzip() function, and various other tweaks. In (limited) tests, approximately 20% faster than bzcat on x86 and about 10% faster on arm. Note that about 2/3 of the time is spent in read_unzip() reversing the Burrows-Wheeler transformation. Much of that time is delay resulting from cache misses.

A lot of cache misses have a chance to be optimized out by some techniques, so even faster implementations are possible.

This one (seek-bzip2) have also an interesting feature of easy seeking in the input file.

My program will consume output of bzip2 and (Theoretically) can do this in parallel on different parts of file. So, parallel bzip2 implementations are considered too.

Thanks.

Upvotes: 6

Views: 7528

Answers (3)

Flaviu
Flaviu

Reputation: 1039

lbzip2 is a good alternative.

sudo apt install lbzip2

lbzip2 -d <archive>

Upvotes: 4

Greg Sadetsky
Greg Sadetsky

Reputation: 5102

If you have access to multi processor machines (it's easy to spin a multi processor virtual machine on Amazon EC2 or Digital Ocean) / machines with a lot of RAM, you should definitely check out PBZIP2:

PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines


To illustrate: I'm currently decompressing a large 17Gb file. bzip2 was writing the decompressed file at a rate of 10Mb/sec; PBZIP2 is writing it now at 160Mb/sec. I'm running it thus:

pbzip2 -v -d -k -m10000 file.bz2

i.e. -v verbose -d decompress -k keep the original file -m1000 use 10Gb of RAM

This is running on a 64Gb RAM, 20 CPU machine on Digital Ocean which costs $0.952/hour. :-)

Upvotes: 0

osgx
osgx

Reputation: 94485

There a bit http://lists.debian.org/debian-mentors/2009/02/msg00135.html of comparison. Parallel versions are considered.

A bit also there http://realworldtech.com/forums/index.cfm?action=detail&id=98883&threadid=98430&roomid=2

links are from intel cilk-parallel version of bzip2 http://software.intel.com/en-us/articles/a-parallel-bzip2/

Also, Intel's ipp-powered bzip2 is rathee good and also trys in IPP (with negative effect) to parallelize some insides of bzip2 (no parallel block decompression) with openmp (intel KMP 5). When limiting it to one or two threads, 20 MByte/s of decompressed stream is real on 2.4 core2 (ipp "v8" code)

Hope this helps.

Upvotes: 4

Related Questions