Reputation: 495
I am using AVAudioCV to record sound and analyze it using Apple's built-in dictation. However, when I try to turn the audio engine on with a button, the audio buffers complete instantly and do not pick up any sound. However, the buffers only begin glitching when I turn the button on for the second time, which leads me to believe that some of the taps aren't being removed from the audio engine when I stop recording, and the taps collide with each other, ending the buffers immediately.
I've searched through a number of posts on glitches in AVAudioCV due to unremoved taps but nothing I've seen has worked so far. Here is what I have:
func recognizeAudioStream() {
let speechRecognizer = SFSpeechRecognizer()
//performs speech recognition on live audio; as audio is captured, call append
//to request object, call endAudio() to end speech recognition
var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
//determines & edits state of speech recognition task (end, start, cancel, etc)
var recognitionTask: SFSpeechRecognitionTask?
let audioEngine = AVAudioEngine()
func startRecording() throws{
//cancel previous audio task
recognitionTask?.cancel()
recognitionTask = nil
//get info from microphone
let audioSession = AVAudioSession.sharedInstance()
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
let inputNode = audioEngine.inputNode
//audio buffer; takes a continuous input of audio and recognizes speech
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
//allows device to print results of your speech before you're done talking
recognitionRequest?.shouldReportPartialResults = true
recognitionTask = speechRecognizer!.recognitionTask(with: recognitionRequest!) {result, error in
var isFinal = false
if let result = result{ //if we can let result be the nonoptional version of result, then
isFinal = result.isFinal
print("Text: \(result.bestTranscription.formattedString)")
}
if error != nil || result!.isFinal{ //if an error occurs or we're done speaking
audioEngine.stop()
inputNode.removeTap(onBus: 0)
recognitionTask = nil
recognitionRequest = nil
let bufferText = result?.bestTranscription.formattedString.components(separatedBy: (" "))
print("completed buffer")
self.addToDictionary(wordNames: bufferText)
self.populateTempWords()
wordsColl.reloadData()
do{
try startRecording()
}
catch{
print(error)
}
}
}
//configure microphone; let the recording format match with that of the bus we are using
let recordingFormat = inputNode.outputFormat(forBus: 0)
//contents of buffer will be dumped into recognitionRequest and into result, where
//it will then be transcribed and printed out
//1024 bytes = dumping "limit": once buffer fills to 1024 bytes, it is appended to recognitionRequest
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
recognitionRequest?.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
}
do{
if(!isRecording){
audioEngine.mainMixerNode.removeTap(onBus: 0)
audioEngine.stop()
return
}
try startRecording()
}
catch{
print(error)
}
}
Above is my recognizeAudioStream() function. it begins by initializing a SFSpeechRecognizer() and AVAudioEngine(), and then calls startRecording(). The previous audio task is cancelled, and the audioEngine of type AVAudioEngine() is set as the inputNode, on which the tap is installed. The variable isFinal stores the data member isFinal of variable result, which is set in the recognitionTask() closure above it. isFinal is set to True when the 1024 byte filter fills up and the taps on the inputNode are removed. The recognitionTask is reset and the text from the buffer is analyzed and loaded into my words dictionary. Beneath this block is the setup of the buffer and audioEngine.
Finally, a do/catch block checks to see if the audioEngine should be recording (isRecording = false if it shouldn't), in which case the tap is removed and audioEngine is stopped. Otherwise, startRecording() is called again to fill another buffer. I have no problems with this until the record button is turned on for the second time.
Here is my record button's objc function:
@objc func recordingState(){
if isRecording{
recordButton.setImage(UIImage(named: "unfilled_star.png"), for: .normal)
isRecording = false
}
else{
print("try to start recording")
recordButton.setImage(UIImage(named: "filled_star.png"), for: .normal)
let speechRecognizer = SFSpeechRecognizer()
requestDictAccess();
if speechRecognizer!.isAvailable { //if the user has granted permission
speechRecognizer?.supportsOnDeviceRecognition = true //for offline data
isRecording = true
recognizeAudioStream()
}
}
}
When running recognizeAudioStream() for the second time, I get this output, the entire blockin roughly a second:
authorized
2021-04-22 22:38:29.671122-0400 DictoCounter[76126:11539342] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.679718-0400 DictoCounter[76126:11539130] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.688449-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.690209-0400 DictoCounter[76126:11539349] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.699224-0400 DictoCounter[76126:11539130] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.719147-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.720698-0400 DictoCounter[76126:11539352] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.734173-0400 DictoCounter[76126:11539102] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.740684-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.741952-0400 DictoCounter[76126:11539370] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.754000-0400 DictoCounter[76126:11539102] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
2021-04-22 22:38:29.761909-0400 DictoCounter[76126:11538858] Words successfully saved.
2021-04-22 22:38:29.763036-0400 DictoCounter[76126:11539371] [aurioc] 323: Unable to join I/O thread to workgroup ((null)): 2
2021-04-22 22:38:29.776616-0400 DictoCounter[76126:11539104] [Utility] +[AFAggregator logDictationFailedWithError:] Error Domain=kAFAssistantErrorDomain Code=209 "(null)"
completed buffer
I'm not too sure what to do here. I've looked through many other posts which suggest that the taps are not being removed, but I removed all taps in the do/catch block. I'm not sure what I'm mussing. Appreciatative of any and all help- thank you in advance!
Upvotes: 1
Views: 636
Reputation: 88192
A possible problem is route/configuration change or interruption, you can check if any of these notifications gets called when you end the recording
private var notificationTokens = [NSObjectProtocol]()
// init
notificationTokens = [
NotificationCenter.default.addObserver(
forName: AVAudioSession.routeChangeNotification,
object: nil,
queue: .main
) { notification in
self.engineStartTimer?.invalidate()
guard self.resumePlaybackAfterSeek == nil else { return }
self.scheduledBufferFrames = 0
self.ignoreBufferIndex = self.lastBufferIndex
self.resumePlaybackAfterSeek = false
startRestartEngineTimer()
},
NotificationCenter.default.addObserver(
forName: AVAudioSession.interruptionNotification,
object: nil,
queue: .main
) { notification in
startRestartEngineTimer()
},
NotificationCenter.default.addObserver(
forName: .AVAudioEngineConfigurationChange,
object: nil,
queue: .main
) { notification in
startRestartEngineTimer()
}
]
If this won't help, please provide a MRE
Upvotes: 0