Reputation: 606
I am writing an application for Android, which uses a library SoX. This library is very strong ARM processor loads. Prompt please: where I can read about how to optimize a library for ARM. Can someone help?
Upvotes: 1
Views: 991
Reputation: 8735
You haven't specified your target hardware. Android devices range from low end ARMv5E processors up to the latest Tegra3. If you want your code to run well on the largest variety of devices, then you will need to support ARMv5 (which doesn't have NEON). Even the Tegra2 (currently the most popular CPU for Android tablets) is missing NEON support. You can address this issue in Android with a "Fat binary" which contains both ARMv5 and ARMv7 code in a single APK. Some general rules about optimizing ARM code:
1) ARMv5/ARMv6 processors have tiny caches - optimize your data set to fit in the smallest space and re-use buffers instead of constantly allocating/freeing them to avoid evicting them from the cache
2) ARMv5/ARMv6 processors have only 4 write buffers. This means that in tight loops, writing bytes or shorts will run at about half the speed of writing longs due to tying up the write buffers. 3) For memory-bound data processing loops, prefetch the cache (PLD instruction). It can generally speed things up another 20-25%. 4) For code which manipulates bits/bytes, writing in ASM is usually a good idea since higher level languages don't do a great job of working with that type of data.L.B.
Upvotes: 1
Reputation: 6354
I've been optimizing codes in assembly for quite some time starting with the MC68000 on Amiga, then mainly ARM9E (ARMv5E). ARM11 was fine with the new SIMD like instructions and saturations. Then came Coretex.
You know what? NEON that came bundled with the Coretex-A series took away the whole motivation optimizing for ARM from me.
Unoptimized NEON codes out of box run roughly 5X faster than assembly optimized ARM codes, and it's so much easier than ARM itself : where I had to struggle hard to get things work, NEON almost always has fitting instructions doing exactly the same or even more accurate on multiple elements at once.
I read that the ARM instruction timings changed much from Coretex in addition to the dual-issue capability which means I have to do many things differently than on ARM9 for maximum performance, but honestly, I don't care anymore. NEON is the way to go.
bye-bye ARM
Don't waste your time on ARM - and especially NEON intrinsics. Start studying NEON instead.
An excellent introduction to NEON : http://bit.ly/8XzPXM
Upvotes: 2