Reputation: 31
I have converted a network into TFlite using DEFAULT optimization (Float32) setting and its inference speed is around 25 fps. Same network when i converted into TFlite INT8 Quantized and its inference speed is around 2 fps on INTEL 8-Core Intel Core i9 2.3 GHz. Is this expected on CPU? Please can somebody explain what causes the slowness of INT8 inference.
Upvotes: 2
Views: 967
Reputation: 3416
You won't see INT8 model boosts compared to Float32 on Intel CPUs under 10th gen. This is because Intel CPUs < 10th gen don't have Intel DLBoost, a specific instruction set (ISA) architecture designed to improve performance of INT8 DL models. This ISA is present in Intel chips from 10th gen onwards. Most certainly, without a specific INT8 ISA the operations get upsampled to Float32.
Upvotes: 1
Reputation: 74
Can you provide more details of the model?
It is certain that quantized model is smaller than float32 models.
For deploying on mobile CPUs, it is common that quantized model may be faster. However, it may not be guaranteed for Intel desktop/laptop CPUs.
Upvotes: 1