C BZ2_bzDecompress way slower than bzip2 command

Question

I'm using mmap/read + BZ2_bzDecompress to sequentially decompress a large file (29GB). This is done because I need to parse the uncompressed xml data, but only need small bits of it, and it seemed like it would be way more efficient to do this sequentially than to uncompress the whole file (400GB uncompressed) and then parse it. Interestingly already the decompression part is extremely slow - while the shell command bzip2 is able to do a bit more than 52MB per second (used several runs of timeout 10 bzip2 -c -k -d input.bz2 > output and divided produced filesize by 10), my program is able to do not even 2MB/s, slowing down after a few seconds to 1.2MB/s

The file I'm trying to process uses multiple bz2 streams, so I'm checking BZ2_bzDecompress for BZ_STREAM_END, and if it occurs, use BZ2_bzDecompressEnd( strm ); and BZ2_bzDecompressInit( strm, 0, 0 ) to restart with the next stream, in case the file hasn't been completely processed. I also tried without BZ2_bzDecompressEnd but that didn't change anything (and I can't really see in the documentation how one should handle multiple streams correctly)

The file is being mmap'ed before, where I also tried different combinations of flags, currently MAP_RDONLY, MAP_PRIVATE with madvise to MADV_SEQUENTIAL | MADV_WILLNEED | MADV_HUGEPAGE (I'm checking return value, and madvise does not report any problems, and I'm on a linux kernel 3.2x debian setup which has hugepage support)

When profiling I made sure that other than some counters for measuring speed and a printf which was limited to once every n iterations, nothing else was run. Also this is on a modern multicore server processor where all other cores where idle, and it's bare metal, not virtualized.

Any ideas on what I could be doing wrong / do to improve performance?

Update: Thanks to James Chong's suggestion I tried "swapping" mmap() with read(), and the speed is still the same. So it seems mmap() is not the problem (either that, or mmap() and read() share an underlying problem)

Update 2: Thinking that maybe the malloc/free calls done in bzDecompressInit/bzDecompressEnd would be the cause, I set bzalloc/bzfree of the bz_stream struct to a custom implementation which only allocates memory the first time and does not free it unless a flag is set (passed by the opaque parameter = strm.opaque). It works perfectly fine, but again the speed did not increase.

Update 3: I also tried fread() instead of read() now, and still the speed stays the same. Also tried different amount of read bytes and decompressed-data-buffer sizes - no change.

Update 4: Reading speed is definitely not an issue, as I've been able to achieve speeds close to about 120MB/s in sequential reading using just mmap().

C BZ2_bzDecompress way slower than bzip2 command

Answers (1)

Related Questions