AWS Lambda code to convert MP3 input for Lex

Question

I have a solution that will reside on a user’s local mobile device, I want this to post audio content to Lex using the AWS REST API. The problem is that the solution can’t stream audio (up or down) and has almost no audio manipulation capabilities locally. However, Lex has very specific input requirements and also streams output.

So access will be via an API Gateway acting as a Proxy with a Lambda (Python 2.7) function to deal with the audio issues.

The output is all taken care of, the Lambda code saves the AudioStream into a file and sends that file as a response body, this works fine. However I can’t get the input to work.

The input audio is an MP3 file sent as the body of a POST request and I need to get this into a format acceptable to Lex.

I’ve investigated the following approaches

Native AWS
Use S3 and Elastic Transcoder - when transcoding to PCM the lowest allowed sample rate is 22050, but Lex requires 16000, this also doesn’t seem to allow transcoding to Opus format

Use MediaConvert - couldn’t see a setting to convert to PCM or Opus

Native Python
Python doesn’t seem to have the ability to unpack MP3 natively. I’ve read that this would be very slow and not worth doing.

Import a library
Use something ffmpeg-python or ffmpy - but this involves creating a deployment package or similar. I could go down this road but this really seems overly complicated for what I want to do.

Use something other than Python
I chose Python as I’m more familiar coding with it in Lambda but perhaps C#, Node, Java 8 have something available that would make this easy in a Lambda function.

At the moment I’m looking at doing the following

Use Python to save the MP3 file to an S3 bucket
Have Elastic Transcoder convert that MP3 to PCM at 22050 sample rate (but with all other settings set as Lex needs)
Lambda read transcoded file back from S3
Use the wave (import wav) library to read the file and then write the file with a sample rate of 16000 (this is the step I’m unsure about)
Post the file (with correct sample rate) to Lex

Of course there will be some latency issues here, but as long as they’re not too severe I’m willing to live with them. This does seem overly complex for what I thought would be a fairly simple task. However, it's the best I’ve come up with so far, but even to prove it out will take a number of hours work and I’ve spent days on this already.

So the main question is whether Python Wave library can be used in AWS Lambda to modify the sample rate in this way?

If not, is there a way of solving this by either creating a deployment package, using an AWS feature I haven’t investigated yet or a neater way of doing this in something other than Python?

The problem is that the Lex part of this app was supposed to be a nice-to-have, it’s not a main feature and yet it’s taken up the majority of the dev time, I’m pretty close to just ditching it but thought I’d ask here first.

Matt Haughton · Accepted Answer

So it took a while but there is a way to do this.

The way I've solved it is to save the file to s3, then run through Elastic Transcoder to get a wav file (1 channel at 22050 sample rate).

Then use the following var values

inrate=22050
outrate=16000
inchannels=2
outchannels=1

And this code should get it down to 16000

import audioop
import wave

s_read = wave.open(src, 'r')
s_write = wave.open(dst, 'w')

n_frames = s_read.getnframes()
data = s_read.readframes(n_frames)

converted = audioop.ratecv(data, 1, inchannels, inrate, outrate, None)

s_write.setparams((outchannels, 2, outrate, 0, 'NONE', 'Uncompressed'))
s_write.writeframes(converted[0])

s_read.close()
s_write.close()

The file is then accepted by Lex and gets a response as expected.

There's some noticeable latency on this method, processing is usually about 7-10 seconds according to CloudWatch Logs so probably not acceptable for a production level solution but it's good enough for my needs.

Thanks to the following sources

AWS Lambda code to convert MP3 input for Lex

Answers (1)

Related Questions