Att Righ
Att Righ

Reputation: 1799

Python AWS lambda python startup hook?

Suppose I have some resources that I want to share between different requests in an aws lambda written in python. How should I implement this?

Are there hooks for "post startup", or should I lazily create resources on the first call? The disadvantage of "lazy inititializing" is that it means some requests will be randomly slow because you pick a consumer to incur the startup cost.

Also... will those resources survive a lambda executable being "frozen"?

This page https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html talks about an "Init" stage. How do I execute things in the init stage? This seems to suggest that Init includes both an "unfreeze" operation and a freeze operation.

Research

Connection pooling in AWS across lambdas

Upvotes: 0

Views: 570

Answers (1)

lynkfox
lynkfox

Reputation: 2400

There are several methods that depend on your use case, but it is very important that Lambdas be 'stateless' as possible - You can never guarantee that any Invoke of a lambda will reuse the same container, or that a container will remain around for the next invoke - the timing of decom of containers is (almost) entirely controlled by AWS backend and not something you can really predict

Method 1, use case: Common Functionality:

if you have written a bunch of common utility functionality that you want to include across multiple lambdas that do different things (not multiple invokes of the same lambda) then you should zip up your helpers/utility directory and add that zip as a layer to each lambda. Code located in a layer can be accessed just the same as any other custom code. This code is immutable to a current version of that layer, but you can upload new versions at any time. Layers of course are the same across every Invoke for a given lambda, no matter what container its in

Method 2: Global Variables, use case common data that does not change during an execution

If you have some resource intense heavy lifting that only has to be done once for many invocations of the same lambda, and can be re-used each time, and does not get changed in the process (such as parsing several instruction files for an engine to follow into memory or loading pickled memory states) then you can put these functions as global calls (outside any given def in your python file). These are loaded/run when the container is first ran. This will result in your Cold-Start lambda's being much slower to start up, but they are available to any Invoke that uses that same container.

You can mitigate the decom of containers somewhat by using CloudWatch Events to send a 'keep alive' (ie a blank ping event) to your lambda on a regular basis so that it continue to be active and the AWS backend does not decom that container. (hence the 'almost' comment above)

This can have un intended consequences of having many containers up and running that you do not need. This also reduces some of the advantage of lambda of only 'paying what you need' because you are effectively creating a server out of the lambda that is running constantly. There are some advanced methods that rely on sending events if your traffic is high and stopping those events if your traffic has dropped to allow most of the containers to die, but it is still on AWS when to decom - which means a given container may stick around far longer than you expect if AWS believes it will need it.

This data is lost when a container is decommed, of course as its in memory.

Method 3: Your own Container (Use case - ... depends)

You can provide your own Container Image for running a lambda in. If you do this, you can do everything you would do with any other container image - such as on start up scripts and the like. Combined with keep alive events you have more access to values that you may be storing in memory, that can be initialized at lambda spin up, but its much more complicated to maintain and update and access, but also much more powerful.

Overall

You shouldn't rely on any of the above techniques though there is some advantage to having them in your tool belt for edge situations. Instead, you should be considering your use cases and how best to make use of cloud native architecture. What data do you need? If it is common data that just needs accessed, a well designed dynamodb schema can retrieve it all in a single query that is about 200ms or less.

If its complex logic/initialization you should consider if lambda is correct for what you are trying to do, or if an EC2 cluster is a better solution - or a combination of an EC2 Cluster + Lambdas + other resources.

If its something more obtuse, then you should spend some time researching other aws services and seeing if one of them can provide what you need more natively to the cloud

Upvotes: 2

Related Questions