Parijat Bose
Parijat Bose

Reputation: 390

What is the best way to run python scripts in AWS?

I have three python scripts, 1.py, 2.py, and 3.py, each having 3 runtime arguments to be passed.

All three python programs are independent of each other. All 3 may run in a sequential manner in a batch or it may happen any two may run depending upon some configuration.

Manual approach:

  1. Create EC2 instance, run python script, shut it down.
  2. Repeat the above step for the next python script.

The automated way would be trigger the above process through lambda and replicate the above process using some combination of services.

What is the best way to implement this in AWS?

Upvotes: 13

Views: 38965

Answers (4)

Kurt Schelfthout
Kurt Schelfthout

Reputation: 8990

You could use meadowrun - disclaimer I am one of the maintainers so obviously biased.

Meadowrun is a python library/tool that manages EC2 instances for you, moves python code + environment dependencies to them, and runs a function without any hassle.

For example, you could put your scripts in a Git repo and run them like so:

import asyncio
from meadowrun import AllocCloudInstance, Deployment, run_function
from script_1 import run_1

async def main():
    results = await run_function(
        # the function to run on the EC2 instance
        lambda: run_1(arguments), 
        # properties of the VM that runs the function
        AllocCloudInstance( 
            logical_cpu_required=2,
            memory_gb_required=16,
            interruption_probability_threshold=15,
            cloud_provider="EC2"),
        # code+env to deploy on the VM, there's other options here
        Deployment.git_repo(
            "https://github.com/someuser/somerepo",
            conda_yml_file="env.yml",
        )
    )

It will then create an EC2 instance with the given requirements for you (or reuse one if it's already there - could be useful for running your scripts in sequence), creates python code + enviroment there, runs the function and returns any results and output.

Upvotes: 2

NicolasZ
NicolasZ

Reputation: 983

For 2022, depending on your infrastructure constraints, i'd say the easiest way would be to set the scripts on Lambda and then call them from the CloudWatch with the required parameters (create a rule):

https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html

That way you can configure them to run independently or sequential and not having to worry about setting up and turning on and off the infrastructure.

This applies to scripts that are not too recursive intensive and that don't run for more than 15 minutes at a time (Lambda time limit)

Upvotes: 1

Felipe Guerra
Felipe Guerra

Reputation: 155

You can run your EC2 instance via a Python Script, using the AWS boto3 library (https://aws.amazon.com/sdk-for-python/). So, a possible solution would be to trigger a Lambda function periodically (you can use Amazon Cloudwatch for periodic events), and inside that function you can boot up your EC2 instance using Python script.

In your instance you can configure your OS to run a Python script every time it boots up, I would suggest you to use Crontab (See this link https://www.instructables.com/id/Raspberry-Pi-Launch-Python-script-on-startup/)

At the end of your script, you can trigger a Amazon SQS event to a function that will shutdown your first instance and than call another function that will start the second script.

Upvotes: 4

au kk
au kk

Reputation: 156

AWS Batch has a DAG scheduler, technically you could define job1, job2, job3 and tell AWS Batch to run them in that order. But I wouldn't recommend that route.

For the above to work you would basically need to create 3 docker images. image1, image2, image3. and then put these in ECR (Docker Hub can also work if not using Fargate launch type).

I don't think that makes sense unless each job is bulky has its own runtime that's different from the others.

Instead I would write a Python program that calls 1.py 2.py and 3.py, put that in a Docker image and run a AWS batch job or just ECS Fargate task.

main.py:

import subprocess

exit_code = subprocess.call("python3 /path/to/1.py", shell=True)

# decide if you want call 2.py and so on ...
# 1.py will see the same stdout, stderr as main.py
# with batch and fargate you can retrieve these form cloudwatch logs ...

Now you have a Docker image that just needs to run somewhere. Fargate is fast to startup, bit pricey, has a 10GB max limit on temporary storage. AWS Batch is slow to startup on a cold start, but can use spot instances in your account. You might need to make a custom AMI for AWS batch to work. i.e. if you want more storage.

Note: for anyone who wants to scream at shell=True, both main.py and 1.py came from the same codebase. It's a batch job, not an internet facing API that took that from user request.

Upvotes: 9

Related Questions