asdwasow
asdwasow

Reputation: 91

Amazon Transcribe on S3 Upload: "[ERROR] BadRequestException: URI provided doesn't point to an S3 object"

I'm trying out Amazon Transcribe on a collection of media files, adapting the sample docs code and using this series as a reference to fit with any upload to my designated media S3 folder, but having issues with my test file.

UPLOAD BUCKET/FOLDER path:

'MediaFileUri': https://us-west-2.console.aws.amazon.com/s3/buckets/upload-asr/mediaupload/file.mp4

I've verified that the file exists and the bucket permissions grant access to the Amazon Transcribe service. I am able to start a manual transcription job with the same URL, but not with the SDK: I've also directly linked it in the function using the path above with no result. I appreciate it might be a URL path issue, but haven't seen much on the subject so checking for an obvious error.

import json
import time
import boto3
from urllib.request import urlopen


def lambda_handler(event, context):
    transcribe = boto3.client("transcribe")
    s3 = boto3.client("s3")

    if event:
        file_obj = event["Records"][0]
        bucket_name = str(file_obj['s3']['bucket']['name'])
        file_name = str(file_obj['s3']['object']['key'])
        file_type = file_name.split(".")[1]
        s3_uri = create_uri(bucket_name, file_name)
        job_name = context.aws_request_id


        transcribe.start_transcription_job(TranscriptionJobName = job_name,
                                            Media = {'MediaFileUri': s3_uri},
                                            OutputBucketName = "bucket-name",
                                            MediaFormat = file_type,
                                            LanguageCode = "en-US")

def create_uri(bucket_name, file_name):

CloudWatch Log Failure Report:

[ERROR] BadRequestException: An error occurred (BadRequestException) when calling the StartTranscriptionJob operation: 
The URI that you provided doesn't point to an S3 object. Make sure that the object exists and try your request again.

Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 25, in lambda_handler
    LanguageCode = "en-US")
  File "/var/runtime/botocore/client.py", line 320, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 623, in _make_api_call
    raise error_class(parsed_response, operation_name) 

SIMILAR: https://forums.aws.amazon.com/thread.jspa?messageID=876906&#876906

Upvotes: 5

Views: 6952

Answers (2)

msklc
msklc

Reputation: 604

There may be 2 other reasons for this error.

1- OutputBucketName: Most of the time the input bucket is taken care of, but if OutputBucketName is empty or copy-paste code is received and only output is written, you may get the same error

2- Permissions: Permissions for both input and output should be checked. By default public access is not possible

Upvotes: 0

John Rotenstein
John Rotenstein

Reputation: 269520

It works for me using this format:

Media={
    'MediaFileUri': f'https://s3-us-west-2.amazonaws.com/{BUCKET}/{KEY}'
},

Upvotes: 3

Related Questions