Renderscript intrinsics slower down the pipeline if the previous step runs on GPU

Question

My dev env is as follows:

Device: Nexus 5
Android: 4.4.2
SDK Tools: 22.6.1
Platform Tools: 19.0.1
Build tools: 19.0.3
Build Target: level 19
Min Target: level 19

I'm doing some image processing application. Basically I need to go through a preprocessing step to the image and then use convolution 5x5 to filter the image. In the preprocessing step, I successfully made the script to run on GPU and achieve good performance. Since Renderscript offers a 5x5 convolution intrinsics, I'd like to use it to make the whole pipeline as fast as possible. However, I found using the 5x5 convolution intrinsics after the preprocssing step is very slow. In contrast, if I use the adb tool to force all the scripts to run on CPU, the speed of the 5x5 convolution intrinsics is a lot faster. In both cases, the time consumed by the preprocessing step is basically the same. So it was the performance of the intrinsics which made the difference.

Also, in the code I use

Allocation.USAGE_SHARED

in creating all the Allocations, hoping the shared memory would facilitate memory access between CPU and GPU.

Since I understand that intrinsics runs on CPU, is this behavior expected? Or did I miss anything? Is there a way to make the GPU script/CPU intrinsics mixed code fast? Thanks a lot!

Renderscript intrinsics slower down the pipeline if the previous step runs on GPU

Answers (1)

Related Questions