iOS 10 SpeechKit: Need help obtaining timestamp of each word during partial results

Question

Right now I need the timestamp of each word in a speech-to-text transcription–– that is, the time that the word is STARTED, along with the duration.

However, when the results of each transcription are being recorded, the timestamps and durations are only recorded once the transcription has completed fully.

Example code (from Apple):

// Configure request so that results are returned before audio recording is finished
recognitionRequest.shouldReportPartialResults = true

// A recognition task represents a speech recognition session.
// We keep a reference to the task so that it can be cancelled.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
    var isFinal = false

    if let result = result {
        self.textView.text = result.bestTranscription.formattedString
        isFinal = result.isFinal

        for word in result.bestTranscription.segments {
            print("\(word.substring)\(word.timestamp)")
        }
    }

    if error != nil || isFinal {
        self.audioEngine.stop()
        inputNode.removeTap(onBus: 0)

        self.recognitionRequest = nil
        self.recognitionTask = nil

        self.recordButton.isEnabled = true
        self.recordButton.setTitle("Start Recording", for: [])
    }
}

Anybody have ideas of how to get the timestamps of words in realtime? They essentially return 0 each time until it's completed. I'm getting the example code from here:

https://developer.apple.com/library/prerelease/content/samplecode/SpeakToMe/Introduction/Intro.html

Nikolay Shmyrev · Accepted Answer

Calculation of timestamps is a computationally expensive operation, it is usually not implemented during decoding, only as a post-processing of the results. So in many engines it is not possible to get partial timestamps.

If you still want timestamps, you need to consider a different library and, probably a different algorithm too.

iOS 10 SpeechKit: Need help obtaining timestamp of each word during partial results

Answers (1)

Related Questions