jsonamazon-web-servicesspeech-to-textwebvtt

Reputation: 569

How can I convert Amazon Transcribe json response to a caption format (srt, webvvt, etc)?

Trying to find a package that convert my json response from the Amazon AWS Transcribe service with no luck.

You can see an example of the JSON in the JavaScript part of the Fiddle.

I wouldn't like to take the naive approach and just "bundle" like 10 words together as that would space the captions in a weird way.

I'd even accept a programmatic way of doing it using the Google Speech service or Speechmatics. They all return a json file broken down by word.

Anyone has worked with that before?

Thanks!

Upvotes: 14

Answers (10)

Paco Hope

Reputation: 450

This is an old question, and 5 years ago these answers were necessary. I happened to find this question and discovered that in 2022, they made srt and vtt into output options directly in the service. The service documentation is here.

Upvotes: 1

F. Lumley

Reputation: 765

I built a web app for this purpose (viewing and editing aws transcribe JSON files): https://scription.app

It separates speakers, highlights low confidence words and links text to audio playback (if you load your audio file). It’s still a beta version but hopefully helpful to anyone coming across this post!

Upvotes: 1

Apoorv Mote

Reputation: 523

Inspired from yash answer I took it and made small changes. Feel free to use it.

https://apoorv.blog/aws-transcribe-json-to-srt.html

I personally use this tool for my own purposes so expect to stay updated.

Upvotes: 1

Jorge P.

Reputation: 303

I ended up creating a Bash script to convert the AWS Transcribe JSON file into SRT.

It use 'jq' to parse the JSON file.

https://github.com/nicolasps/aws-transcribe-to-srt

Upvotes: 0

Tim Clauss

Reputation: 91

I've used this python script from github and it formats really nicely into docx format. The output even includes scatterplots of the confidence levels of words as well as changing the colors to lower confidence words.

https://github.com/kibaffo33/aws_transcribe_to_docx

This worked really well for me, but I think you could have this go to html fairly simply if you wanted to alter the python script.

Upvotes: 5

Daniel Angel

Reputation: 569

Here's a gist that you might be able to use. Or give you an idea of what's required. Basically what what I ended up doing. https://gist.github.com/mwleinad/67a39d7d723f0a2ed076ed2485e098ae

Upvotes: 0

Raj

Reputation: 1

Here is a simple utility script that I found to convert the Amazon Transcribe .json transcript into a more readable transcript

https://github.com/purdy/aws-transcribe-transcript

Upvotes: 0

Leon Nortje

Reputation: 1

I came across this answer, and was also looking for it for a while, by using some of the information that is displayed in some of the other links - got close to something that I can use, but not getting to the exact answer, I decided to complete the solution.

Step 1 - Get a HTML template to handle the textblock and speaker names, and button to press to handle the javascript Step 2 - Paste the json received from Aws into the text block Step 3 - click the botton.

Html page can be found here: https://js.do/lnortje_gmail-com/amazon-transcribe-to-html-converter

One of the things that I found useful is to know the confidence of the translation - using this helps to know where possible issues might be in the translation and also showing the exact time in which the piece was translated allows you to go to that place of the recording.

Well, use it and enjoy, might help someone some day :)

Upvotes: 0

Yash Gadhiya

Reputation: 185

You probably would have found a way to do that or created a script. I also tried finding some ready made solution so ended up writing some JavaScript code to generate SRT from the JSON output of Amazon Transcribe.

https://www.yash.info/aws-srt-creator.htm

I am breaking sentences at period (.). It's a standalone HTML file. Feels free to download and modify as required.

Upvotes: 10

Jeankowkow

Reputation: 834

There is something here (aws-transcribe-to-vtt) but I haven't been able to test it yet...

Upvotes: 0

How can I convert Amazon Transcribe json response to a caption format (srt, webvvt, etc)?

Answers (10)

Related Questions