Reputation: 105
I am writing a spelling bee app. I have been using SFSpeechRecognizer, but it doesn't do very well with single letter cause I'm guessing it's looking for spoken phrases.
I've been googling SFSpeechRecognizer for a while and haven't found much in regards to getting it to recognize single letters.
I have had to generate a list of things that SFSpeechRecognizer kicks out when letters are said and just validate based on that list.
Is there some setting in SFSpeechRecognizer that will make it handle single spoken letters better?
Upvotes: 4
Views: 841
Reputation: 2340
Eventhough the thread is old, I might have some good results to share for anyone who would pass by.
The "trick" I am using is to actually make a letter correspond to a "word", or something close:
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to create a SFSpeechAudioBufferRecognitionRequest object") }
recognitionRequest.shouldReportPartialResults = true
// Associate a letter to a word or an onomatopoeia
var letters = ["Hey": "A",
"Bee": "B",
"See": "C",
"Dee": "D",
"He": "E",
"Eff": "F",
"Gee": "G",
"Atch": "H"]
// This tells the speech recognition to focus on those words
recognitionRequest.contextualStrings = letters.key
Then, when receiving the audio in recognitionTask
, we access the dictionary to detect which letter the word is associated to.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
var isFinal = false
if let result = result {
isFinal = result.isFinal
let bestTranscription = result.bestTranscription
// extract the confidence the recognizer has in this word
let confidence = bestTranscription.segments.isEmpty ? -1 : bestTranscription.segments[0].confidence
print("Best \(result.bestTranscription.formattedString) - Confidence: \(confidence)")
// Only keep results with some confidence
if confidence > 0 {
// If the transcription matches one of our keys we can retrieve the letter
if letters.key.map({ $0.lowercased() }) .contains(result.bestTranscription.formattedString.lowercased()) {
let detected = result.bestTranscription.formattedString
print("Letter: \(letters[detected])")
// And stop recording afterwards
self.stopRecording()
}
}
}
if error != nil || isFinal {
// The rest of the boilerplate from Apple's doc sample probect...
}
}
Notes:
shouldReportPartialResults
to true, otherwise it waits quite a while before sending the resultrecognitionRequest.contextualStrings
, the confidence tends to skyrocket when it recognizes one of those strings. You could probably increase the confidence treshold to 0.3 or 0.4Some results:
Best Gee - Confidence: 0.0
// ... after a while, half a second maybe ...
Best Gee - Confidence: 0.864
Found G
(Apple's sample project to test it out: https://developer.apple.com/documentation/speech/recognizing_speech_in_live_audio)
Upvotes: 0
Reputation: 863
Check the Answer : https://stackoverflow.com/a/42925643/1637953
Declare String
variable to hold recognized word.
Timer
at the beginning of the audio session:strWords = ""
var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTask", userInfo: nil, repeats: false)
recognitionTaskWithRequest
add below code:strWords = result.bestTranscription.formattedString
didFinishTalk
called, then:if strWords == "" {
timer.invalidate()
timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
} else {
// do your stuff using "strWord"
}
Upvotes: 1