Reputation: 569
Trying to find a package that convert my json response from the Amazon AWS Transcribe service with no luck.
You can see an example of the JSON
in the JavaScript part of the Fiddle.
I wouldn't like to take the naive approach and just "bundle" like 10 words together as that would space the captions in a weird way.
I'd even accept a programmatic way of doing it using the Google Speech service or Speechmatics. They all return a json file broken down by word.
Anyone has worked with that before?
Thanks!
Upvotes: 14
Views: 17080
Reputation: 450
This is an old question, and 5 years ago these answers were necessary. I happened to find this question and discovered that in 2022, they made srt and vtt into output options directly in the service. The service documentation is here.
Upvotes: 1
Reputation: 765
I built a web app for this purpose (viewing and editing aws transcribe JSON files): https://scription.app
It separates speakers, highlights low confidence words and links text to audio playback (if you load your audio file). It’s still a beta version but hopefully helpful to anyone coming across this post!
Upvotes: 1
Reputation: 523
Inspired from yash answer I took it and made small changes. Feel free to use it.
https://apoorv.blog/aws-transcribe-json-to-srt.html
I personally use this tool for my own purposes so expect to stay updated.
Upvotes: 1
Reputation: 303
I ended up creating a Bash script to convert the AWS Transcribe JSON file into SRT.
It use 'jq' to parse the JSON file.
https://github.com/nicolasps/aws-transcribe-to-srt
Upvotes: 0
Reputation: 91
I've used this python script from github and it formats really nicely into docx format. The output even includes scatterplots of the confidence levels of words as well as changing the colors to lower confidence words.
https://github.com/kibaffo33/aws_transcribe_to_docx
This worked really well for me, but I think you could have this go to html fairly simply if you wanted to alter the python script.
Upvotes: 5
Reputation: 569
Here's a gist that you might be able to use. Or give you an idea of what's required. Basically what what I ended up doing. https://gist.github.com/mwleinad/67a39d7d723f0a2ed076ed2485e098ae
Upvotes: 0
Reputation: 1
Here is a simple utility script that I found to convert the Amazon Transcribe .json transcript into a more readable transcript
https://github.com/purdy/aws-transcribe-transcript
Upvotes: 0
Reputation: 1
I came across this answer, and was also looking for it for a while, by using some of the information that is displayed in some of the other links - got close to something that I can use, but not getting to the exact answer, I decided to complete the solution.
Step 1 - Get a HTML template to handle the textblock and speaker names, and button to press to handle the javascript Step 2 - Paste the json received from Aws into the text block Step 3 - click the botton.
Html page can be found here: https://js.do/lnortje_gmail-com/amazon-transcribe-to-html-converter
One of the things that I found useful is to know the confidence of the translation - using this helps to know where possible issues might be in the translation and also showing the exact time in which the piece was translated allows you to go to that place of the recording.
Well, use it and enjoy, might help someone some day :)
Upvotes: 0
Reputation: 185
You probably would have found a way to do that or created a script. I also tried finding some ready made solution so ended up writing some JavaScript code to generate SRT from the JSON output of Amazon Transcribe.
https://www.yash.info/aws-srt-creator.htm
I am breaking sentences at period (.). It's a standalone HTML file. Feels free to download and modify as required.
Upvotes: 10
Reputation: 834
There is something here (aws-transcribe-to-vtt
) but I haven't been able to test it yet...
Upvotes: 0