Reputation: 3
The story: The timestamp of the generated transcript file seems to be a bit delayed with the image played on the video. We are expecting the caption text visible right at the moment that the sound is being played (e.g. syncing with the lips of the person who's talking) But the transcript content always appeared a bit later. I would like to know if there is a way that I could tell the service that the timestamp should be a bit sooner through the API (https://learn.microsoft.com/en-us/rest/api/media/transforms/create-or-update?tabs=HTTP#audioanalyzerpreset - maybe the "experimentalOptions" could do the trick? Thank you.
I'm following this document https://learn.microsoft.com/en-us/rest/api/media/transforms/create-or-update?tabs=HTTP#audioanalyzerpreset, but nothing useful in my case.
Here is the example I'm working with:
WEBVTT
NOTE duration:"00:00:48"
NOTE recognizability:0.886
NOTE language:en-us
NOTE Confidence: 0.9478216
00:00:00.000 --> 00:00:01.680 You know, our mission at
NOTE Confidence: 0.9478216
00:00:01.680 --> 00:00:03.360 Microsoft is to empower every
NOTE Confidence: 0.9478216
00:00:03.360 --> 00:00:04.572 person and every organization
NOTE Confidence: 0.9478216
00:00:04.572 --> 00:00:06.390 on the planet to be able
NOTE Confidence: 0.768422245
00:00:06.390 --> 00:00:08.060 to achieve more. Empowerment is
Upvotes: 0
Views: 107
Reputation: 3163
There is no API provided for modifying the timestamps on the VTT. However, you could create your own processing on the VTT to modify the timestamps per your requirement.
We do not have a sample for caption reprocessing, just to let you know- there are online caption editors available that can accomplish this. Another possibility is using a text editor, although there are specific editors tailored for VTT files.
I have relayed this feedback internally to our product engineering team.
Upvotes: 0