Gaurav Jaglan
Gaurav Jaglan

Reputation: 1

AWS Transcribe, Unable to find any Boto3 code snippet to utilize the custom vocabulary

I am using AWS Transcribe for speech recognition. Though I have created my custom vocabulary, I am unable to find any Boto3 code snippet to utilize the it in python. Kindly find the sample code attached.

client_transcribe = boto3.client('transcribe') client_transcribe.start_transcription_job(TranscriptionJobName=job_name, Media={'MediaFileUri': file_url}, MediaFormat='mp4',LanguageCode='en-US', OutputBucketName=bucket)

Upvotes: 0

Views: 484

Answers (1)

Joel Van Hollebeke
Joel Van Hollebeke

Reputation: 770

The vocabulary name is a member of the settings object, a parameter to the start_transcription_job method.

Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/transcribe.html#TranscribeService.Client.start_transcription_job

Example:

settings = {
    'VocabularyName': 'your-custom-vocabulary-name-goes-here'
}

client_transcribe.start_transcription_job(
    TranscriptionJobName=job_name,
    LanguageCode='your-language-code-goes-here',
    Settings=settings,
    MediaFormat='mp4',
    OutputBucketName=bucket
    Media={
        'MediaFileUri': file_url
    })

If you need help to determine the language code of your vocabulary, you can use the following AWS cli command from your terminal if you have AWS cli installed:

aws transcribe get-vocabulary --vocabulary-name {your-custom-vocabulary-name}

It returns a response such as:

{
  "LastModifiedTime": 1573523589.419,
  "VocabularyName": "redacted",
  "DownloadUri": "redacted",
  "LanguageCode": "en-US",
  "VocabularyState": "READY"
}

For example, if the language code for your vocabulary is en-US, then use that language code when calling start_transcription_job.

Hope this helps!

Upvotes: 1

Related Questions