Reputation: 2598
We are using libjpeg for JPEG decoding on our small embedded platform. We have problems with speed when we decode large images. For example, image which is 20 MB large and has dimensions of 5000x3000 pixels needs 10 seconds to load.
I need some tips on how to improve decoding speed. On other platform with similar performance, I have the same image load in two seconds.
Best decrease from 14 seconds to 10 seconds we got from using larger read buffer (64 kB instead of default 4 kB). But nothing else helped.
We do not need to display image in full resolution, so we use scale_num and scale_denom to display it in smaller size. But I would like to have more performance. Is it possible to use some kind of multithreading etc.? Different decoding settings? Anything, I ran of ideas.
Upvotes: 4
Views: 4893
Reputation: 3730
Take a look at libjpeg-turbo. If you have supported hardware then it is generally 2-4 times faster then libjpeg on the same CPU. Tipical 12MB jpeg is decoded in less then 2 seconds on Pandaboard. You can also take a look at speed analisys of various JPEG decoders here.
Upvotes: 2
Reputation: 93476
Multi-threading could only help the decode process if the target had multiple execution units for true concurrent execution. Otherwise it will just time-slice existing CPU resources. It won't help in any case unless the library were designed of make use of it.
If you built the library from source, you should at first ensure you built it with optimisation switched on, and carefully select the compiler options to match the build to your target and its instruction set to enable the compiler to use SIMD or an FPU for example.
Also you need to consider other possible bottlenecks. Is the 10 seconds just the time to decode or does it include the time to read from a filesystem or network for example? Given the improvement observed when you increased the read buffer size, it seems hghly likly that it is the data read rather than the decode that is limiting in this case.
If in fact the filesystem access is the limiting factor rather then the decode, then there may be some benefit in separating the file read from the decode in a separate thread and passing the data via a pipe or queue or multiple shared memory buffers to the decoder. You may then ensure that the decoder can stream the decode without having to wait for filesystem blocking.
Upvotes: 2
Reputation: 6005
First - profile the code. You're left with little more than speculation if you cannot definitively identify the bottlenecks.
Next, scour the documentation for libjpeg speedup opportunities. You mentioned scale_num
and scale_denom
. What about the decompressor's dct_method
? I've found that the DCT_FASTEST
option is good. There are other options to check: do_fancy_upsampling
, do_block_smoothing
, dither_mode
, two_pass_quantize
, etc. Some of all of these may be useful to you, depending on your system, libjpeg version, etc.
If profiling tools are unavailable, there are still some things to try. First, I suspect your bottlenecks are non-CPU related. To confirm, load the uncompressed image into a RAM buffer, then decompress it from there as you have been. Did that significantly improve the decompression time? If so, the culprit would appear to be the read operation from your image storage media. Depending on your system, reading from USB (or SD, etc) can be slow. (Note that I'm assuming a read from external media - although hardware details are scant.) Be sure to optimize relevant bus parameters, as well (SPI clocks, configurations, etc).
If you are reading from something like internal flash (i.e. NAND), there are some other things to inspect. How is your NAND controller configured? Have you ensured that the controller is configured for the fastest operation? Check wait states, timings, etc. Note that bus and/or memory contention can be an issue, too - so inspect their respective configurations, as well.
Finally, if you believe your system is actually CPU bound, this stackoverflow question may be of interest: Can a high-performance jpeglib-turbo implmentation decompress/compress in <100ms?
Upvotes: 4