Reputation: 2662
I have an audio file in S3
.
I don't know the language of the audio file. So I need to use IdentifyLanguage
for start_transcription_job()
.
LanguageCode
will be blank since I don't know the language of the audio file.
Envirionment
Using
Python 3.8 runtime,
boto3 version 1.16.5
,
botocore version: 1.19.5
,
no Lambda Layer.
Here is my code for the Transcribe job:
mediaFileUri = 's3://'+ bucket_name+'/'+prefixKey
transcribe_client = boto3.client('transcribe')
response = transcribe_client.start_transcription_job(
TranscriptionJobName="abc",
IdentifyLanguage=True,
Media={
'MediaFileUri':mediaFileUri
},
)
Then I get this error:
{
"errorMessage": "Parameter validation failed:\nMissing required parameter in input: \"LanguageCode\"\nUnknown parameter in input: \"IdentifyLanguage\", must be one of: TranscriptionJobName, LanguageCode, MediaSampleRateHertz, MediaFormat, Media, OutputBucketName, OutputEncryptionKMSKeyId, Settings, ModelSettings, JobExecutionSettings, ContentRedaction",
"errorType": "ParamValidationError",
"stackTrace": [
" File \"/var/task/app.py\", line 27, in TranscribeSoundToWordHandler\n response = response = transcribe_client.start_transcription_job(\n",
" File \"/var/runtime/botocore/client.py\", line 316, in _api_call\n return self._make_api_call(operation_name, kwargs)\n",
" File \"/var/runtime/botocore/client.py\", line 607, in _make_api_call\n request_dict = self._convert_to_request_dict(\n",
" File \"/var/runtime/botocore/client.py\", line 655, in _convert_to_request_dict\n request_dict = self._serializer.serialize_to_request(\n",
" File \"/var/runtime/botocore/validate.py\", line 297, in serialize_to_request\n raise ParamValidationError(report=report.generate_report())\n"
]
}
With this error, means that I must specify the LanguageCode
and IdentifyLanguage
is an invalid parameter.
100% sure the audio file exist in S3. But without LanguageCode
it don't work, and IdentifyLanguage
parameter is unknown parameter
I using SAM application to test locally using this command:
sam local invoke MyHandler -e lambda\TheDirectory\event.json
And I cdk deploy
, and check in Aws Lambda Console as well, tested it the same events.json
, but still getting the same error
This I think is Lambda Execution environment, I didn't use any Lambda Layer.
I look at this docs from Aws Transcribe:
https://docs.aws.amazon.com/transcribe/latest/dg/API_StartTranscriptionJob.html
and this docs of boto3
:
Clearly state that LanguageCode
is not required and IdentifyLanguage
is a valid parameter.
So what I missing out? Any idea on this? What should I do?
Update:
I keep searching and asked couple person online, I think I should build the function container 1st to let SAM package the boto3 into the container.
So what I do is, cdk synth
a template file:
cdk synth --no-staging > template.yaml
Then:
sam build --use-container
sam local invoke MyHandler78A95900 -e lambda\TheDirectory\event.json
But still, I get the same error, but post the stack trace as well
[ERROR] ParamValidationError: Parameter validation failed:
Missing required parameter in input: "LanguageCode"
Unknown parameter in input: "IdentifyLanguage", must be one of: TranscriptionJobName, LanguageCode, MediaSampleRateHertz, MediaFormat, Media, OutputBucketName, OutputEncryptionKMSKeyId, Settings, JobExecutionSettings, ContentRedaction
Traceback (most recent call last):
File "/var/task/app.py", line 27, in TranscribeSoundToWordHandler
response = response = transcribe_client.start_transcription_job(
File "/var/runtime/botocore/client.py", line 316, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 607, in _make_api_call
request_dict = self._convert_to_request_dict(
File "/var/runtime/botocore/client.py", line 655, in _convert_to_request_dict
request_dict = self._serializer.serialize_to_request(
File "/var/runtime/botocore/validate.py", line 297, in serialize_to_request
raise ParamValidationError(report=report.generate_report())
Really no clue what I doing wrong here. I also report a github issue here, but seem like cant reproduce the issue.
Main Question/Problem:
Unable to start_transription_job
without LanguageCode
with IdentifyLanguage=True
What possible reason cause this, and how can I solve this problem(Dont know the languange of the audio file, I want to identify language of audio file without given the LanguageCode) ?
Upvotes: 1
Views: 1050
Reputation: 2662
End up I notice this is because my packaged lambda function isn’t being uploaded for some reason. Here is how I solved it after getting help from couple of people.
First modify CDK stack which define my lambda function like this:
from aws_cdk import (
aws_lambda as lambda_,
core
)
from aws_cdk.aws_lambda_python import PythonFunction
class MyCdkStack(core.Stack):
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
# define lambda
my_lambda = PythonFunction(
self, 'MyHandler',
entry='lambda/MyHandler',
index='app.py',
runtime=lambda_.Runtime.PYTHON_3_8,
handler='MyHandler',
timeout=core.Duration.seconds(10)
)
This will use aws-lambda-python module ,it will handle installing all required modules into the docker.
Next, cdk synth a template file
cdk synth --no-staging > template.yaml
At this point, it will bundling all the stuff inside entry
path which define in PythonFunction
and install all the necessary dependencies defined in requirements.txt
inside that entry
path.
Next, build the docker container
$ sam build --use-container
Make sure template.yaml
file in root directory. This will build a docker container, and the artifact will build inside .aws-sam/build
directory in my root directory.
Last step, invoke the function using sam:
sam local invoke MyHandler78A95900 -e path\to\event.json
Now finally successfully call start_transcription_job
as stated in my question above without any error.
In Conclusion:
pip install boto3
, this only will
install the boto3
in my local system.sam local invoke
without build the container 1st by sam build --use-container
sam build
at last, but in that point, I didn't
bundle what defined inside requirements.txt
into the
.aws-sam/build
, therefore need to use aws-lambda-python
module as stated above.Upvotes: 0
Reputation: 601
Check whether you are using the latest boto3 version.
boto3.__version__
'1.16.5'
I tried it and it works.
import boto3
transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(TranscriptionJobName='Test-20201-27',IdentifyLanguage=True,Media={'MediaFileUri':'s3://BucketName/DemoData/Object.mp4'})
print(response)
{
"TranscriptionJob": {
"TranscriptionJobName": "Test-20201-27",
"TranscriptionJobStatus": "IN_PROGRESS",
"Media": {
"MediaFileUri": "s3://BucketName/DemoData/Object.mp4"
},
"StartTime": "datetime.datetime(2020, 10, 27, 15, 41, 2, 599000, tzinfo=tzlocal())",
"CreationTime": "datetime.datetime(2020, 10, 27, 15, 41, 2, 565000, tzinfo=tzlocal())",
"IdentifyLanguage": "True"
},
"ResponseMetadata": {
"RequestId": "9e4f94a4-20e4-4ca0-9c6e-e21a8934084b",
"HTTPStatusCode": 200,
"HTTPHeaders": {
"content-type": "application/x-amz-json-1.1",
"date": "Tue, 27 Oct 2020 14:41:02 GMT",
"x-amzn-requestid": "9e4f94a4-20e4-4ca0-9c6e-e21a8934084b",
"content-length": "268",
"connection": "keep-alive"
},
"RetryAttempts": 0
}
}
Upvotes: 1