Reputation: 704
I successfully converted TensorFlow model to TensorFlow Lite float16 model according to Post-training float16 quantization.
The below is a diagram of the converted model.
And I ran successfully it on MatePad Pro(Kirin 990) by my C++ code.
What I wrote especially for NNAPI is SetAllowFp16PrecisionForFp32 and UseNNAPI before AllocateTensors.
m_interpreter->SetAllowFp16PrecisionForFp32(true);
m_interpreter->UseNNAPI(true);
m_interpreter->AllocateTensors();
But the performance is not good.
I checked logs by adb logcat
and found that both armnn and liteadapter, which I think as Huawei's NNAPI driver, fail to support major operations such as CONV_2D and nnapi-reference, which is CPU implementation of NNAPI, executes as fallback.
The messages are like below.
AndroidNN: AnnOpConvParser::isSupport1_1(280)::"Conv para is model Input err"
Why do NNAPI drivers except for nnapi-reference fail to support operations?
And how can I fix it?
I wonder that Dequantize operations in the converted model should not be there and each operation should have float16 parameters.
I don't know my guess is right and even though it is right, I have no idea to eliminate Dequantize operations.
(And of course, I tried float32 converted model. The outputs of float32 model were quite different between SetAllowFp16PrecisionForFp32(false) and SetAllowFp16PrecisionForFp32(true).
So I concluded that you need float16 quantization for NNAPI.)
The below is summary of observation.
Assuming setUseNNAPI(true),
Please give me advices!
Upvotes: 1
Views: 595
Reputation: 704
I found that the reasons why it did not run on NPU were followings.
float16 quantization prevents it.
Unsupported operation may cause not only CPU fallback of the operation but also failure of whole compilation of model.
A simpler model runs on NPU without change of the code.
Upvotes: 1