Reputation: 1726
I have 6 second audio recording(ar-01.wav
) in wav
format. I want to transcribe the audio file to text using amazon services. For that purpose I created a bucket by name test-voip
and uploaded the audio file to bucket. When I try to convert the speech to text, a 6 second audio is taking 13.12 seconds. Here is my code snippet
session = boto3.Session(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
transcribe = session.client('transcribe', region_name='us-east-1')
job_name = "audio_text_trail9"
job_uri = "https://test-voip.s3.amazonaws.com/ar-01.wav"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='wav',
LanguageCode='en-US',
MediaSampleRateHertz=16000
)
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
print("converted to text")
myurl = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
print(myurl)
Text_Data = (requests.get(myurl).json())['results']['transcripts'][0]['transcript']
print(Text_data)
Here my code is working fine and the accuracy is awesome even on a noisy audio, but the time consumption is too high. Where did I do the mistake and what is dragging that much huge time to transcribe? Once I get the transcribed json
, time for extracting the information required is negligible. How to speed up the process for transcribe or is there any other better way to do it?
Upvotes: 3
Views: 7779
Reputation: 96
I have researched for a trascription speed guarantee with no luck
In this forum post (requires an aws account) a poster makes a benchmark with the following results:
What seems to be an official Amazon source states that "At this time, transcription speeds are better optimized for audio longer than 30 seconds. You'll start to see a better processing time to audio duration time ratio when the audio file length is about 2 minutes or longer. Having said, this we are working hard to enhance transcription speeds overall"
I hope it helps researchers
Upvotes: 3
Reputation: 5015
For me, AWS Transcribe took 20 minutes to transcribe a 17 minute file. One possible idea is to split the audio file in chunks and then use multiprocessing with 16 cores at EC2, like a g3.4xlarge instance.
Split the audio file in 16 parts with a silence threshold of -20, then convert to .wav:
$ sudo apt-get install mp3splt
$ sudo apt-get install ffmpeg
$ mp3splt -s -p th=-20,nt=16 splitted.mp3
$ ffmpeg -i splitted.mp3 splitted.wav
Then, use the multiprocessing with 16 cores transcribing simultaneously, mapping your transcribe function (transcribe.start_transcription_job) for each one of the TranscriptionJobName and job_uri's:
import multiprocessing
output=[]
data = range(0,16)
def f(x):
job_name = "Name"+str(x)
job_uri = "https://s3.amazonaws.com/bucket/splitted"+str(x)+".wav"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='wav',
LanguageCode='pt-BR',
OutputBucketName= "bucket",
MediaSampleRateHertz=8000,
Settings={"MaxSpeakerLabels": 2,
"ShowSpeakerLabels": True})
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED','FAILED']:
break
def mp_handler():
p = multiprocessing.Pool(16)
r=p.map(f, data)
return r
if __name__ == '__main__':
output.append(mp_handler())
Upvotes: 2