Reputation: 393
I'm trying to figure out the average frequency or range of a person's voice as they speak into the microphone. It does not have to be real time. My approach so far was to use AVAudioEngine and AVAudioPCMBuffer, get the buffer data and convert it to FFT.
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
self.recognitionRequest?.append(buffer)
let data = buffer.floatChannelData?[0]
let arrayOfData = Array(UnsafeBufferPointer(start: data, count: Int(buffer.frameLength)))
let fftData = self.performFFT(arrayOfData)
}
func performFFT(_ input: [Float]) -> [Float] {
var real = [Float](input)
var imag = [Float](repeating: 0.0, count: input.count)
var splitComplex = DSPSplitComplex(realp: &real, imagp: &imag)
let length = vDSP_Length(floor(log2(Float(input.count))))
let radix = FFTRadix(kFFTRadix2)
let weights = vDSP_create_fftsetup(length, radix)
vDSP_fft_zip(weights!, &splitComplex, 1, length, FFTDirection(FFT_FORWARD))
var magnitudes = [Float](repeating: 0.0, count: input.count)
vDSP_zvmags(&splitComplex, 1, &magnitudes, 1, vDSP_Length(input.count))
var normalizedMagnitudes = [Float](repeating: 0.0, count: input.count)
vDSP_vsmul(sqrt(magnitudes), 1, [2.0 / Float(input.count)], &normalizedMagnitudes, 1, vDSP_Length(input.count))
vDSP_destroy_fftsetup(weights)
return normalizedMagnitudes
}
public func sqrt(_ x: [Float]) -> [Float] {
var results = [Float](repeating: 0.0, count: x.count)
vvsqrtf(&results, x, [Int32(x.count)])
return results
}
I think I'm returning proper FFT Data, printing looks like this:
However this can't be the correct Hz. It was me speaking, and avg male voices have a range of 85 to 180 Hz. I'm just not sure where to go from here.
Goal is to find a frequency average or range for the when a user speaks through the mic. Thanks so much for any help!!!
Upvotes: 2
Views: 1040
Reputation: 70693
The FFT magnitude is a spectral frequency estimator (which doesn't work for many voice pitches), not a pitch detection/estimation algorithm. Try a pitch estimation algorithm instead, which can better detect a fundamental pitch even if the vocal harmonic/overtone series has more spectral power.
Upvotes: 1