RaresV
RaresV

Reputation: 31

Using KissFFT to create features for tflite-micro audio classification

I am trying to run audio classification using tflite-micro on ESP32, with fixed point calculations.

The model is create using keras, then it is converted to tflite and quantized to uint8. Cross validation between the keras model and tflite yields good results on python.

My code structure is audio capture(int16) --> create spectrogram using stft (int16->uint16) --> perform quantization(uint16->uint8) --> do inference.

The stft is created on esp32 using KissFFT, using 16 bit int as input.

My problem is that I can't find the way how to do the scaling and quantization for the fixed point to match the values that are input to the model in python for the same audio file.

The fft for a window looks appropriate(see the attached picture).

FFT comparison

I tried different formulas for adjusting the output value from KissFFT to be in the same range as the one from Octave / Python but could not find any that makes the inference output equivalent.

Any thoughts on how should I scale the data?

Upvotes: 1

Views: 229

Answers (1)

Jon Nordby
Jon Nordby

Reputation: 6289

First make sure you use the same length of FFT. The shape of the values look quite similar. So it looks reasonable that this might only be a scaling factor, as you suspect. Take the one spectrum and divide by the other. This will give you candidates for the scaling factor. Check their distribution, and pick the value that is closest.

For fixed-point there are sometimes scaling used to fit inside the chosen number representation. So there are many possible options.

For FFT it is rather common to also divide the output by FFT length. So it could be that one of your implementations does this, and the other not.

Upvotes: 0

Related Questions