Aviel Gross
Aviel Gross

Reputation: 9975

iOS10 Speech Recognition "Listening" sound effect

I am doing live speech recognition with the new iOS10 framework. I use AVCaptureSession to get to audio.

I have a "listening" beep sound to notify the user he can begin talking. The best way to put that sound is at the 1st call to captureOutput(:didOutputSampleBuffer..), but if I try to play a sound after starting the session the sound just won't play. And no error is thrown.. it just silently fail to play...

What I tried:

It seems like regardless of what I am doing, it is impossible to trigger playing any kind of audio after triggering the recognition (not sure if it's specifically the AVCaptureSession or the SFSpeechAudioBufferRecognitionRequest / SFSpeechRecognitionTask...)

Any ideas? Apple even recommends playing a "listening" sound effect (and do it themselves with Siri) but I couldn't find any reference/example showing how to actually do it... (their "SpeakToMe" example doesn't play sound)

Upvotes: 5

Views: 801

Answers (1)

Aviel Gross
Aviel Gross

Reputation: 9975

Well, apparently there are a bunch of "rules" one must follow in order to successfully begin a speech recognition session and play a "listening" effect only when (after) the recognition really began.

  1. The session setup & triggering must be called on main queue. So:

    DispatchQueue.main.async {
        speechRequest = SFSpeechAudioBufferRecognitionRequest()
        task = recognizer.recognitionTask(with: speechRequest, delegate: self)
        capture = AVCaptureSession()
        //.....
        shouldHandleRecordingBegan = true
        capture?.startRunning()
    }
    
  2. The "listening" effect should be player via AVPlayer, not as a system sound.

  3. The safest place to know we are definitely recording, is in the delegate call of AVCaptureAudioDataOutputSampleBufferDelegate, when we get our first sampleBuffer callback:

    func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    
        //only once per recognition session
        if shouldHandleRecordingBegan {
            shouldHandleRecordingBegan = false
    
            player = AVPlayer(url: Bundle.main.url(forResource: "listening", withExtension: "aiff")!)
            player.play()            
    
            DispatchQueue.main.async {
                //call delegate/handler closure/post notification etc...
            }
        }
    
        // append buffer to speech recognition
        speechRequest?.appendAudioSampleBuffer(sampleBuffer)
    }
    
  4. End of recognition effect is hell of a lot easier:

    var ended = false
    
    if task?.state == .running || task?.state == .starting {
        task?.finish() // or task?.cancel() to cancel and not get results.
        ended = true
    }
    
    if true == capture?.isRunning {
        capture?.stopRunning()
    }
    
    if ended {
        player = AVPlayer(url: Bundle.main.url(forResource: "done", withExtension: "aiff")!)
        player.play()
    }
    

Upvotes: 3

Related Questions