Reputation: 616
I have a LAMP server set up in EC2. A simple website hosted on this web server in /var/www/html/
allows a user to upload an audio file of people having a discussion via an input form:
<form action="../cgi-bin/store_mp3_view" method="post" accept-charset="utf-8" enctype="multipart/form-data">
<label for="mp3">Audio file</label>
<input type="file" name="filename" />
<input type="submit" value="Upload" />
</form>
This audio file gets stored in /tmp/
. As you can see, this triggers a Python script I have in cgi-bin. Here is the script: http://pastebin.com/iNU6WSUV. This script then uploads the uploaded audio file from my web server to an API by Honda which will detect utterances and produce an audio file for each utterance as well as a json object containing metadata for each utterance. It appears the utterance files can be fetched separately, as well as the json for each utterance from Hondas API: https://api.hark.jp/docs/en/05_reference_webapi.html. My script waits for all of this processing to complete (all utterances to be processed and ready), then retrieves each audio file and sends it to Bing Speech API to get the text from speech. This is because I want to play each utterance audio file and associated text and metadata in the browser as the conversation happened in sequence/real-time. A player, if you will. The problem is all of this takes too long, as the browser is receiving a gateway timeout from the cgi script. It can take several minutes. Specifically, Hark takes a while to return the complete results of the audio analysis, but it appears I can query their API and retrieve intermediate results as mentioned earlier. However, the utterances don't finish in order, so utterance 3 may be ready before utterance 2, but I need to show 2 before 3 because conversations have an order of utterances. What is the best way to go about building an app that can do this? How can I background these API calls to not block and cause a timeout? Should I be using something like Flask for this web app? How can I render the results in the webpage as I iteratively poll and retrieve them from Hark? Is CGI the wrong tool for the job? Thanks.
Upvotes: 0
Views: 538
Reputation: 616
While Ali Nikneshans answer was helpful, it seems CGI is not the right tool for the job. I decided to stop using a LAMP stack/CGI apps and setup a Tornado web server with web sockets, which allows me to do async calls easily, background tasks, and use coroutines to setup a data pipeline for polling the API endpoint and feeding the data into the browser.
This presentation was quite helpful for understanding coroutines:
http://www.dabeaz.com/coroutines/Coroutines.pdf.
And for Tornado:
http://www.tornadoweb.org/en/stable/index.html.
Upvotes: 0
Reputation: 3502
Generally the way to handle long delay is using yield
and sending partial data to client. Instead of obj.wait()
you need a loop to check if status is finished and if not printing something like: ...
and sleep for one second. This way you will not receive timeout.
Upvotes: 1