Reputation: 1259
I'm using mmap/read + BZ2_bzDecompress to sequentially decompress a large file (29GB). This is done because I need to parse the uncompressed xml data, but only need small bits of it, and it seemed like it would be way more efficient to do this sequentially than to uncompress the whole file (400GB uncompressed) and then parse it. Interestingly already the decompression part is extremely slow - while the shell command bzip2 is able to do a bit more than 52MB per second (used several runs of timeout 10 bzip2 -c -k -d input.bz2 > output
and divided produced filesize by 10), my program is able to do not even 2MB/s, slowing down after a few seconds to 1.2MB/s
The file I'm trying to process uses multiple bz2 streams, so I'm checking BZ2_bzDecompress
for BZ_STREAM_END
, and if it occurs, use BZ2_bzDecompressEnd( strm );
and BZ2_bzDecompressInit( strm, 0, 0 )
to restart with the next stream, in case the file hasn't been completely processed. I also tried without BZ2_bzDecompressEnd
but that didn't change anything (and I can't really see in the documentation how one should handle multiple streams correctly)
The file is being mmap'ed before, where I also tried different combinations of flags, currently MAP_RDONLY
, MAP_PRIVATE
with madvise to MADV_SEQUENTIAL | MADV_WILLNEED | MADV_HUGEPAGE
(I'm checking return value, and madvise does not report any problems, and I'm on a linux kernel 3.2x debian setup which has hugepage support)
When profiling I made sure that other than some counters for measuring speed and a printf which was limited to once every n iterations, nothing else was run. Also this is on a modern multicore server processor where all other cores where idle, and it's bare metal, not virtualized.
Any ideas on what I could be doing wrong / do to improve performance?
Update: Thanks to James Chong's suggestion I tried "swapping" mmap()
with read()
, and the speed is still the same. So it seems mmap()
is not the problem (either that, or mmap()
and read()
share an underlying problem)
Update 2: Thinking that maybe the malloc/free calls done in bzDecompressInit/bzDecompressEnd would be the cause, I set bzalloc/bzfree of the bz_stream struct to a custom implementation which only allocates memory the first time and does not free it unless a flag is set (passed by the opaque parameter = strm.opaque). It works perfectly fine, but again the speed did not increase.
Update 3: I also tried fread() instead of read() now, and still the speed stays the same. Also tried different amount of read bytes and decompressed-data-buffer sizes - no change.
Update 4: Reading speed is definitely not an issue, as I've been able to achieve speeds close to about 120MB/s in sequential reading using just mmap().
Upvotes: 4
Views: 718
Reputation: 1
Swapping, mmap flags have with them little to do. If bzip2 is slow, it is not because of the file I/O.
I think your libbz2 wasn't fully optimized. Recompile it with the most brutal gcc flags which you can imagine.
My second idea were if there is some ELF linking overhead. In this case the problem will disappear if you link in bz2 statically. (After that you will be able to think how to make this fast with dynamically loaded libbz2).
Important extension from the future: Libbz2 must be reentrant, thread-safe and position-independent. This means various C flags to be compiled with, and these flags don't have a good effect to performance (although they produce much faster code). In an extrem case I could even imagine a 5-10-times slow, compared to the single-threaded, non-PIC, non-reentrant version.
Upvotes: 1