robinyapockets
robinyapockets

Reputation: 393

Find Average Voice Frequency/Range through Microphone (AVAudioPCMBuffer and FFT)

I'm trying to figure out the average frequency or range of a person's voice as they speak into the microphone. It does not have to be real time. My approach so far was to use AVAudioEngine and AVAudioPCMBuffer, get the buffer data and convert it to FFT.

inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
            self.recognitionRequest?.append(buffer)

        let data =  buffer.floatChannelData?[0]
        let arrayOfData = Array(UnsafeBufferPointer(start: data, count: Int(buffer.frameLength)))
        let fftData = self.performFFT(arrayOfData)
}




func performFFT(_ input: [Float]) -> [Float] {

    var real = [Float](input)
    var imag = [Float](repeating: 0.0, count: input.count)
    var splitComplex = DSPSplitComplex(realp: &real, imagp: &imag)

    let length = vDSP_Length(floor(log2(Float(input.count))))
    let radix = FFTRadix(kFFTRadix2)
    let weights = vDSP_create_fftsetup(length, radix)
    vDSP_fft_zip(weights!, &splitComplex, 1, length, FFTDirection(FFT_FORWARD))


    var magnitudes = [Float](repeating: 0.0, count: input.count)
    vDSP_zvmags(&splitComplex, 1, &magnitudes, 1, vDSP_Length(input.count))

    var normalizedMagnitudes = [Float](repeating: 0.0, count: input.count)

    vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)], &normalizedMagnitudes, 1, vDSP_Length(input.count))

    vDSP_destroy_fftsetup(weights)    
    return normalizedMagnitudes
}


public func sqrt(_ x: [Float]) -> [Float] {
    var results = [Float](repeating: 0.0, count: x.count)
    vvsqrtf(&results, x, [Int32(x.count)])
    return results
}

I think I'm returning proper FFT Data, printing looks like this:

enter image description here

However this can't be the correct Hz. It was me speaking, and avg male voices have a range of 85 to 180 Hz. I'm just not sure where to go from here.

Goal is to find a frequency average or range for the when a user speaks through the mic. Thanks so much for any help!!!

Upvotes: 2

Views: 1040

Answers (1)

hotpaw2
hotpaw2

Reputation: 70693

The FFT magnitude is a spectral frequency estimator (which doesn't work for many voice pitches), not a pitch detection/estimation algorithm. Try a pitch estimation algorithm instead, which can better detect a fundamental pitch even if the vocal harmonic/overtone series has more spectral power.

Upvotes: 1

Related Questions