Serverless python with 1 hour timeout?

Question

I have a several python scripts that follow a similar format: you pass in a date, and it either: - checks my S3 bucket for the file with that date in the filename, and parses it or - Runs a python script doing some analysis on the file of that date. The important thing is that I need to have a timeout of at least 1 hour.

I am looking for a serverless solution that would let me call these functions on a range of dates, and run them all in parallel. Because of the long duration of my python script, services like AWS and Google Cloud Functions don't work because of their timeouts (15 minutes and 9 minutes respectively). I have looked at Google Cloud Dataflow, but am not sure whether this is overkill for my relatively simple use case.

Something with the lowest possible outages is important, so I am leaning towards something from AWS, Google Cloud, etc.

I also would like to be able to see a dashboard of the progress of each job with logs, so I can see which dates have completed and which dates had a bug (plus what the bug is)

Kolban · Accepted Answer

The nature of a micro server such as AWS Lambda or GCP Cloud Functions is that, by their definition, short running. If a computational resource might run for a long time (and I consider an hour to be a long time) then the microservice story isn't a good match. Let us now look at what we do actually desire. I am presuming that you want:

The ability to provide a compute environment that will run as long as needed (eg an hour).
Minimize cost (if there is no work to do, there should be no cost).
If there is a lot of concurrent work to do, scale the number of compute instances up as needed and reduce the number when done.

One possible solution is to use GCP Compute Engines and the notion of the "managed instance group". Using this technology you define a Compute Engine template that will spin up either a Linux or Windows VM instance with as many (or as few) CPUs and RAM as needed. The number of instances is a function of how you define load ... including dropping to zero. When you define your Compute Engine template, you have 100% control over it including defining initial startup applications through a startup script. I could imagine you writing a startup script that runs your application.

While this is indeed more work than the mantra of "you bring the code and we bring everything else" it is the state of play. Other potential solutions are (as you were alluding to) examine the nature of your processed and sub-divide it into finer grained and smaller work units.

If Compute Engines feel like too much, an alternative would be to embrace Kubernetes and have a cluster with pods containing your application.

References:

Queue-based scaling made easy with new Stackdriver per-group metrics

Serverless python with 1 hour timeout?

Answers (2)

Related Questions