swift AVAudioEngine Glitches

Question

I am using AVAudioCV to record sound and analyze it using Apple's built-in dictation. However, when I try to turn the audio engine on with a button, the audio buffers complete instantly and do not pick up any sound. However, the buffers only begin glitching when I turn the button on for the second time, which leads me to believe that some of the taps aren't being removed from the audio engine when I stop recording, and the taps collide with each other, ending the buffers immediately.

I've searched through a number of posts on glitches in AVAudioCV due to unremoved taps but nothing I've seen has worked so far. Here is what I have:

 func recognizeAudioStream() {
         let speechRecognizer = SFSpeechRecognizer()
         
         //performs speech recognition on live audio; as audio is captured, call append
         //to request object, call endAudio() to end speech recognition
         var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
         
         //determines & edits state of speech recognition task (end, start, cancel, etc)
         var recognitionTask: SFSpeechRecognitionTask?
         
         let audioEngine = AVAudioEngine()
         
         
         func startRecording() throws{
             
             //cancel previous audio task
             recognitionTask?.cancel()
             recognitionTask = nil
             
             //get info from microphone
             let audioSession = AVAudioSession.sharedInstance()
             try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
             try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
             
             let inputNode = audioEngine.inputNode
             
             //audio buffer; takes a continuous input of audio and recognizes speech
             recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
             //allows device to print results of your speech before you're done talking
             recognitionRequest?.shouldReportPartialResults = true
             
             
             recognitionTask = speechRecognizer!.recognitionTask(with: recognitionRequest!) {result, error in
                 
                 var isFinal = false
                 
                 if let result = result{ //if we can let result be the nonoptional version of result, then
                     isFinal = result.isFinal
                     print("Text: \(result.bestTranscription.formattedString)")
                     
                 }
                 
                 if error != nil || result!.isFinal{ //if an error occurs or we're done speaking
                     
                     audioEngine.stop()
                     inputNode.removeTap(onBus: 0)
                     
                     recognitionTask = nil
                     recognitionRequest = nil

                     let bufferText = result?.bestTranscription.formattedString.components(separatedBy: (" "))
                     print("completed buffer")
                     
                     self.addToDictionary(wordNames: bufferText)
                     self.populateTempWords()
                     
                     wordsColl.reloadData()
 
                     do{
                         try startRecording()
                     }
                     catch{
                         print(error)
                     }
                     
                 }
             
             }
             
             //configure microphone; let the recording format match with that of the bus we are using
             let recordingFormat = inputNode.outputFormat(forBus: 0)
             
             //contents of buffer will be dumped into recognitionRequest and into result, where
             //it will then be transcribed and printed out
             //1024 bytes = dumping "limit": once buffer fills to 1024 bytes, it is appended to recognitionRequest
             inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
                 recognitionRequest?.append(buffer)
             }
             
             audioEngine.prepare()
             try audioEngine.start()
         }
         
         do{
             if(!isRecording){
                audioEngine.mainMixerNode.removeTap(onBus: 0)
                audioEngine.stop()
                 return
             }
             try startRecording()
         }
         catch{
             print(error)
         }
         
         
     }

Above is my recognizeAudioStream() function. it begins by initializing a SFSpeechRecognizer() and AVAudioEngine(), and then calls startRecording(). The previous audio task is cancelled, and the audioEngine of type AVAudioEngine() is set as the inputNode, on which the tap is installed. The variable isFinal stores the data member isFinal of variable result, which is set in the recognitionTask() closure above it. isFinal is set to True when the 1024 byte filter fills up and the taps on the inputNode are removed. The recognitionTask is reset and the text from the buffer is analyzed and loaded into my words dictionary. Beneath this block is the setup of the buffer and audioEngine.

Finally, a do/catch block checks to see if the audioEngine should be recording (isRecording = false if it shouldn't), in which case the tap is removed and audioEngine is stopped. Otherwise, startRecording() is called again to fill another buffer. I have no problems with this until the record button is turned on for the second time.

Here is my record button's objc function:

    @objc func recordingState(){
        if isRecording{
            recordButton.setImage(UIImage(named: "unfilled_star.png"), for: .normal)
            isRecording = false
        }
        
        else{
            print("try to start recording")
            recordButton.setImage(UIImage(named: "filled_star.png"), for: .normal)
 
            let speechRecognizer = SFSpeechRecognizer()
            requestDictAccess();
            
            if speechRecognizer!.isAvailable { //if the user has granted permission
                speechRecognizer?.supportsOnDeviceRecognition = true //for offline data
                isRecording = true
                
                recognizeAudioStream()
            }
        }
    }

When running recognizeAudioStream() for the second time, I get this output, the entire blockin roughly a second:

authorized
2021-04-22 22:38:29.671122-0400 DictoCounter[76126:11539342] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.679718-0400 DictoCounter[76126:11539130] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.688449-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.690209-0400 DictoCounter[76126:11539349] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.699224-0400 DictoCounter[76126:11539130] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.719147-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.720698-0400 DictoCounter[76126:11539352] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.734173-0400 DictoCounter[76126:11539102] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.740684-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.741952-0400 DictoCounter[76126:11539370] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.754000-0400 DictoCounter[76126:11539102] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.761909-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.763036-0400 DictoCounter[76126:11539371] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.776616-0400 DictoCounter[76126:11539104] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer

I'm not too sure what to do here. I've looked through many other posts which suggest that the taps are not being removed, but I removed all taps in the do/catch block. I'm not sure what I'm mussing. Appreciatative of any and all help- thank you in advance!

swift AVAudioEngine Glitches

Answers (1)

Related Questions