iOS10 Speech Recognition "Listening" sound effect

Question

I am doing live speech recognition with the new iOS10 framework. I use AVCaptureSession to get to audio.

I have a "listening" beep sound to notify the user he can begin talking. The best way to put that sound is at the 1st call to captureOutput(:didOutputSampleBuffer..), but if I try to play a sound after starting the session the sound just won't play. And no error is thrown.. it just silently fail to play...

What I tried:

Playing through a system sound (AudioServicesPlaySystemSound...())
Play an asset with AVPlayer
Also tried both above solutions async/sync on main queue

It seems like regardless of what I am doing, it is impossible to trigger playing any kind of audio after triggering the recognition (not sure if it's specifically the AVCaptureSession or the SFSpeechAudioBufferRecognitionRequest / SFSpeechRecognitionTask...)

Any ideas? Apple even recommends playing a "listening" sound effect (and do it themselves with Siri) but I couldn't find any reference/example showing how to actually do it... (their "SpeakToMe" example doesn't play sound)

I can play the sound before triggering the session, and it does work (when starting the session at the completion of playing the sound) but sometimes theres a lag in actually staring the recognition (mostly when using BT headphones and switching from a different AudioSession category - for which I do not have a completion event...) - because of that I need a way to play the sound when the recording actually starts, and not before it triggers and cross fingers it won't lag starting it...

Aviel Gross · Accepted Answer

Well, apparently there are a bunch of "rules" one must follow in order to successfully begin a speech recognition session and play a "listening" effect only when (after) the recognition really began.

The session setup & triggering must be called on main queue. So:

DispatchQueue.main.async {
    speechRequest = SFSpeechAudioBufferRecognitionRequest()
    task = recognizer.recognitionTask(with: speechRequest, delegate: self)
    capture = AVCaptureSession()
    //.....
    shouldHandleRecordingBegan = true
    capture?.startRunning()
}

The "listening" effect should be player via AVPlayer, not as a system sound.

The safest place to know we are definitely recording, is in the delegate call of AVCaptureAudioDataOutputSampleBufferDelegate, when we get our first sampleBuffer callback:

func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {

    //only once per recognition session
    if shouldHandleRecordingBegan {
        shouldHandleRecordingBegan = false

        player = AVPlayer(url: Bundle.main.url(forResource: "listening", withExtension: "aiff")!)
        player.play()            

        DispatchQueue.main.async {
            //call delegate/handler closure/post notification etc...
        }
    }

    // append buffer to speech recognition
    speechRequest?.appendAudioSampleBuffer(sampleBuffer)
}

End of recognition effect is hell of a lot easier:

var ended = false

if task?.state == .running || task?.state == .starting {
    task?.finish() // or task?.cancel() to cancel and not get results.
    ended = true
}

if true == capture?.isRunning {
    capture?.stopRunning()
}

if ended {
    player = AVPlayer(url: Bundle.main.url(forResource: "done", withExtension: "aiff")!)
    player.play()
}

iOS10 Speech Recognition "Listening" sound effect

Answers (1)

Related Questions

iOS10 Speech Recognition &quot;Listening&quot; sound effect

Answers (1)

Related Questions

iOS10 Speech Recognition "Listening" sound effect