Reputation: 63
I am currently exploring OpenAI's Whisper API and have been trying to extract specific information from the audio recognition results. Specifically, I'm interested in obtaining word timestamps alongside the transcribed text.
When utilizing the local Whisper models, I can get a result like this:
[
{"start": 0.0, "end": 0.5, "word": "hello"},
{"start": 0.5, "end": 1.0, "word": "world"}
]
I tried to use it as parameter but it didn't work. I have seen it with the local model in this notebook:
Upvotes: 1
Views: 1122