Reputation: 53
I have 3 containers images that would run my workload.
(each of these expects these file in its own file system)
So airflow tasks would be:
So container 1 >> container 2 >> container 3
I want to use the KubernetesPodOperator for airflow to take advantage of auto-scaling options for airflow running in kubernetes. But since a KubernetesPodOperator create one pod per task, and each of these are their own tasks, how can I pass these files around?
I can modify the source code in each container to be aware of an intermediate location like s3 to upload files, but is there a way to built in airflow way of doing this without modifying the source workers?
Upvotes: 3
Views: 2455
Reputation: 30083
You can use the S3 amazon operator in airflow : https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/operators/s3.html
Or you write custom boto3 code however if you are not looking for code you can use the NFS or EFS services.
Read more about that : https://medium.com/asl19-developers/create-readwritemany-persistentvolumeclaims-on-your-kubernetes-cluster-3a8db51f98e3
You want to scale so in this case you have to use the : RWX — ReadWriteMany
You can also check out the different NFS services like : Minio, GlusterFS, etc which will provide you PVC with the ReadWriteMany option.
Files will be persistent into PVC disk managed by NFS or if using EFS service AWS, all PODs can use those files and access it.
If you are on GCP GKE, feel free to review my other answer: How to create a dynamic persistent volume claim with ReadWriteMany access in GKE?
Upvotes: 0
Reputation: 1812
Airflow does not pass files, there is xcom but it is not for files more like small information data passing between tasks.
I would suggest S3 like you already mentioned. Another alternative is using k8s native features, so you can mount same disk volume (persistent disk volumes) to all 3 containers and they can read/write files in the local file system which actually backed by a shared file system on k8s cluster level. But this is a little more complex setup than just using s3, so I would only do it if s3-like system is not an option for my setup.
Upvotes: 0