Reputation: 59
Right now I need the timestamp of each word in a speech-to-text transcription–– that is, the time that the word is STARTED, along with the duration.
However, when the results of each transcription are being recorded, the timestamps and durations are only recorded once the transcription has completed fully.
Example code (from Apple):
// Configure request so that results are returned before audio recording is finished
recognitionRequest.shouldReportPartialResults = true
// A recognition task represents a speech recognition session.
// We keep a reference to the task so that it can be cancelled.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
var isFinal = false
if let result = result {
self.textView.text = result.bestTranscription.formattedString
isFinal = result.isFinal
for word in result.bestTranscription.segments {
print("\(word.substring)\(word.timestamp)")
}
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.recordButton.isEnabled = true
self.recordButton.setTitle("Start Recording", for: [])
}
}
Anybody have ideas of how to get the timestamps of words in realtime? They essentially return 0 each time until it's completed. I'm getting the example code from here:
https://developer.apple.com/library/prerelease/content/samplecode/SpeakToMe/Introduction/Intro.html
Upvotes: 1
Views: 620
Reputation: 25220
Calculation of timestamps is a computationally expensive operation, it is usually not implemented during decoding, only as a post-processing of the results. So in many engines it is not possible to get partial timestamps.
If you still want timestamps, you need to consider a different library and, probably a different algorithm too.
Upvotes: 1