Java - Questions to Estimating fundamental frequency

Question

im trying to estimate the fundamental frequency from a .wav file which contains a recording of the speech of 1 word.

What i've tried to do is to read the file with audioInputStream. The format is PCM_SIGNED 44100.0 Hz, 16 bit, stereo, 4 bytes/frame, little-endian.

Therefore i have made a new buffer to contain just one channel. This code achieves that:

double [] audioRight = new double[audioBytes.length/2]; 
for(int i = 0, k = 0; i <= audioBytes.length-1; i+=4, k+=2){
    audioRight[k]=audioBytes[i];
    audioRight[k+1]=audioBytes[i+1];
}

Then the data was moved to a fftBuffer, which is twice the size, and then an DFT is applied. The library used is JTransform. the function used is called realForwardFull.

DoubleFFT_1D fftDo= new DoubleFFT_1D(audioLeft.length);
double[] fftBuffer = new double [audioLeft.length*2];

for (int i = 0; i < audioLeft.length; i++){
     fftBuffer[i] = audioLeft[i];
}
fftDo.realForwardFull(fftBuffer);

This gives a list of complex numbers which I use to calculate the magnitude/amplitude of each complex number in order to make a power spectrum.

The formula used to get the amplitude Amplitude=sqrt(IMIM+RERE).

This provides an array of amplitudes which I apply the harmonic summation method to. Harmonic summation is where the index + 3 harmonics that gives the highest sum is the index that represents the fundamental frequency.

double top_sum = 0;
double first_index = 0;
double sum = 0;
double f_0 = 0;
double FR = audioInputStream.getFormat().getSampleRate()/2/ampBuffer.length;

for (int i = 50; i <= ampBuffer.length/4-1; i++){
sum = ampBuffer[i]+ampBuffer[i*2]+ampBuffer[i*3]+ampBuffer[i*4];
     if (top_sum < sum){
 top_sum=sum;
 first_index = i;

This index however needs to be mapped back to the correct frequency domnain. To my understanding that should be done by saying (index / fttBuffer.length)*sampleRate.

This provides an estimate of the fundamental frequency.

The result however is not "correct". I have several different .wav files to test on, and with most of them the result is way outside the expected range. For the same female voices, three different words gives the results 40, 13 and 360. All three results are expected to be in the range 250 to 350, approximately.

Some of the issues I think is causing this is the amplitude buffer values. When plotted the graph doesnt show any clear peaks that represents the harmoncis.

Here's an image of the graph:

I know this was a lot of information, but I believe more information makes it easier to understand what has been done.

RECAP: What I am unsure of is the amplitude data. Does the values make sense? Are they plotted correctly? Do i need to do something with the data before i search it for the harmoncis and find the fundamental frequency?

I have considered to apply some kind of windowing, because I have a suspicion that leakage might be why the peaks that the plot does have isnt harmonics to each other.

Any help or suggestions would be appreciated. In advance, thank you for your help!

EDIT: As an attempt to what was suggested:

 ByteBuffer buf = ByteBuffer.wrap(audioBytes);
         buf.order(ByteOrder.LITTLE_ENDIAN);
         double[] audio = new double[audioBytes.length/2];  


         for(int i = 0; i < audioBytes.length/2; i++) {
             short s = buf.getShort();
             double mono = (double) s;
             double mono_norm = mono / 32768.0;

             audio[i]=mono_norm;


         }

Now one channel of the pcm data should be saved in the array audio[].

Java - Questions to Estimating fundamental frequency

Answers (1)

Related Questions