bumpbump
bumpbump

Reputation: 794

AWS Managed Airflow vs. AWS Lambda + Step Functions vs. Kubeflow on AWS EKS

This is going to be a fairly general question. I have a pipeline that I would like to execute in real time. The pipeline can have sudden and unpredictable load changes, so scalability (both up and down) are important. The pipeline stages can be packaged as docker containers though they don't necessarily start that way.

I see three ways to build said pipeline on AWS. 1) I can write an Airflow DAG and use AWS managed workflows for Apache airflow. 2) I can write an AWS lambda pipeline with AWS step functions. 3) I can write a Kubeflow pipeline on top of AWS EKS.

These three options have different ramifications in terms of cost and scalability, I would presume. E.g. scaling a Kubernetes cluster in AWS EKS will be a lot slower than scaling Lambda functions assuming I don't hit the service quota for Lambdas. Can someone comment on the scalability of AWS managed Airflow? Does it scale faster than EKS? How does it compare to AWS Lambdas?

Upvotes: 3

Views: 6110

Answers (1)

Josh Fell
Josh Fell

Reputation: 3589

Why not use Airflow to orchestrate the entire pipeline? Airflow can certainly invoke a Step Function using the StepFunctionStartExecutionOperator or by writing a custom Python function to do the same with the PythonOperator.

Seems like this solution would be the best of both worlds: true data orchestration, monitoring, and alerting in Airflow (while keeping a fairly light Airflow instance since it's pure orchestration) with the scalability and responsiveness in AWS Lambda.

I've used this method for a very similar use case in the past and it worked like a charm. Plus, if you need to scale this pipeline to integrate with other services and systems in the future, Airflow gives you that flexibility because it's an orchestrator and is system- and provider-agnostic.

Upvotes: 1

Related Questions