I am working on an application using CoreAudio on the iPhone/iPad. The application both plays audio through the speakers (output) as well as records audio from the microphone (input) at the same time. For the purposes of this application it is extremely important that I be able to compare the input and output, specifically how well they "line up" in the time domain. Because of this, correctly calculating the total latency between the input and output channels is critical. I am testing across 3 different devices. An iPhone, an iPad, and the simulator. I've been able to empirically determine that the latency for the iPhone is somewhere around 4050 samples, the iPad is closer to 4125 samples, and the simulator is roughly 2500 samples. After much research (aka googling) I found a smattering of discussions online about calculating latency in CoreAudio, but they generally pertain to using CoreAudio on OSX rather than iOS. Because of this, they refer to various functions that do not exist on iOS. However, it seems that for iOS the correct solution will be to use AVAudioSession and some combination of the inputLatency , outputLatency , and IOBufferDuration . However, no combinations of these values seem to add up to the empirically determined values above. In addition, I get wildly different values for each parameter when I check them before vs. after calling AudioUnitInitialize . Even more confusing is that the values are much closer to the expected latency before the call to AudioUnitInitialize , which is the opposite of what I would expect. Here are the values I am seeing. iPad (before): in 0.032375, out 0.013651, buf 0.023220, total samples 3054 iPad (after): in 0.000136, out 0.001633, buf 0.023220, total samples 1102 iPhone (before): in 0.065125, out 0.004500, buf 0.021333, total samples 4011 iPhone (after): 0.000354, out 0.000292, buf 0.021333, total samples 969 The simulator always returns 0.01 for in and out, but I suspect these aren't actual/correct values and that the simulator just doesn't support this functionality. One other potentially interesting note is that I'm using kAudioUnitSubType_VoiceProcessingIO rather than kAudioUnitSubType_RemoteIO which I do expect to add some additional latency. My assumption is that this would be included in the inputLatency value, but perhaps there's another value I need to query to include this? What's the correct way to determine the total latency between input and output in iOS?

CoreAudio: Calculate total latency between input and output with kAudioUnitSubType_VoiceProcessingIO

Reputation: 419

Each device has its own latency indicators. Even if the same model and OS version. Estimating time on simulators does not make sense. It will not show the actual latency of the devices.

Latency cannot be calculated with high accuracy. Because You do not take into account the time during which your signal reaches the microphone. Also at each start latency of work with streams is still imposed.

The microphone selected for recording is also affected. starting with iPhone 6 there are at least three of them. The default is lower.

I’ve been dealing with such issues for two years. The most effective way is to calibrate (balance) the device. When starting your audio unit, you need to send a random high-frequency signal. Getting it at the entrance, evaluate the difference and start from it.

I adjust the streams themselves with the help of buffers to always process the corresponding samples.

Better to do at every start. It takes a split second, but your I / O streams are always in sync.

EDIT 1

If you will do a calibrator:

Keep in mind that voiprocessing processes high-frequency sounds worse.
Audibility of frequencies above 18 kHz drops significantly.
When recording and playing at the same time, the top speaker is used by default (you most likely already know this).
When generating a signal, use only multiple frequencies (I don’t know how it is in English). Frequencies must be multiples of sampleRate / frameSize.

For example, with a sampling frequency of 44100 and a sample size of 512, you can use frequencies that are multiples of the ratio 44100/512 = 86.13.

Frequencies: 86.13 Hz, 172.27 Hz, 258.40 Hz, 344.53 Hz, 430.66 Hz, 516.80 Hz, 602.93 Hz, 689.06 Hz, 775.20 Hz, 861.33 Hz, 947.46 Hz, 1033.59 Hz, 1119.73 Hz, 1205.86 Hz, etc.

Otherwise, when converting a signal to a spectrum, you will get blur.

EDIT 2

Create sample and get sample spectrum example code.

import Foundation
import Accelerate
import AudioUnit
import AVFoundation

public class StackExample {


    //
    // createSample(512, [1, 3, 5])
    // Was create sample with length 512 reports for frequencies: 86.13 Hz (1), 258.40 Hz (3), 430.66 Hz (5).
    // Number of frequency is number of multiplicity 44100/512
    // You can use frequencies from 1 to half of frameSize
    //
    public func createSample(frameSize: Int, frequencies: [Int]) -> [Float] {
        // result sample
        var sample = [Float]()
        // prepare diferent report in sample
        for index in 0..<frameSize {
            var report: Float = 0.0
            for frequencyNumber in frequencies {
                report += sinf(2.0 * Float.pi * Float(index) * Float(frequencyNumber) / Float(frameSize))
            }
            // report value mast been in range between -1.0 and 1.0
            // if we send more one frequencies we must divide each report by the number of frequencies
            if frequencies.count > 1 { report = report / Float(frequencies.count) }

            // with this configuration, the signal will immediately play at maximum volume. It must be smoothed in sinusoin over the entire segment.
            report *= sinf(Float.pi * Float(index) / Float(frameSize - 1))

            sample.append(report)
        }

        return sample
    }

    // spectrum was half of count of reports in sample
    // for sample with length 512 get spectrum with 256 frequencies. Frequency numbers are also multiple like in code of generation of sample.
    public func getSpectrum(frameSize: Int, sample: [Float]) -> [Float] {
        // create fft setup
        let frameLog2Size = UInt(log2(Double(frameSize)))
        let fftSetup = vDSP_create_fftsetup(frameLog2Size, FFTRadix(FFT_RADIX2))!
        let spectrumSize = frameSize / 2

        var reals = [Float]()
        var imags = [Float]()

        for (idx, element) in sample.enumerated() {
            if idx % 2 == 0 {
                reals.append(element)
            } else {
                imags.append(element)
            }
        }

        var complexBuffer = DSPSplitComplex(realp: UnsafeMutablePointer(mutating: reals), imagp: UnsafeMutablePointer(mutating: imags))
        // direct fft transform
        vDSP_fft_zrip(fftSetup, &complexBuffer, 1, UInt(frameLog2Size), Int32(FFT_FORWARD))
        var magnitudes = [Float](repeating: 0.0, count: spectrumSize)
        // calculation of magnitudes
        vDSP_zvmags(&complexBuffer, 1, &magnitudes, 1, UInt(spectrumSize))
        return magnitudes
    }
}

EDIT 3

How work calibration in simple:

Send signal.
Listening input stream and await signal.
When you find sample with signal upper then threshold use prev current and next sample for binary search.

Upvotes: 3

hotpaw2

Reputation: 70743

Part of the audio latency discrepancy you are seeing is likely due to trying to configure your app's audio processing for 44100 samples per second.

The native hardware sample rate on any new iOS device is 48k sps (or perhaps an integer multiple thereof), so initializing your audio unit for 44.1k IO is possibly adding a (hidden software) sample rate conversion process or two to your audio graph. You might be able to remove this latency discrepancy by running your app's internal signal path at 48k sps (or possibly even 96k or 192k). If you need to use 44.1 .wav files, then handle any needed rate conversions outside the audio unit graph and within your app's own pre/post real-time processing code (e.g. convert and re-write the files if needed).

You might also be able to reduce the actual physical input-to-output latency by using the audio session to request much shorter audio buffer durations (less than 5 milliseconds may be possible on newer iOS devices) via setPreferredIOBufferDuration().

Not sure if the above is compatible with the voice processing subtype.

On the other hand, the iOS Simulator might be running on a Mac that supports a native 44.1k sample rate in hardware. Thus a possible reason for the difference in your measured iOS device vs. Simulator latencies.

Upvotes: 1

CoreAudio: Calculate total latency between input and output with kAudioUnitSubType_VoiceProcessingIO

Answers (2)

Related Questions