Reputation: 13088
I have a Cloud Composer cluster running about a dozen dags a day. They all run during a 5 hour period in the middle of the night. The biggest DAG takes ~3 hours to complete running on 5 nodes, and the bulk of the work is highly parallelizable (that is, if we scale it up to, say, 15 nodes, it'd finish way sooner). In an effort to both keep costs low (or possibly reduce it), and improve our throughput, it'd be great if I could scale the cluster up when the big DAG is running, then scale it back down for the remaining almost 20 hours in the day when nothing is happening in the cluster. Using the UI, it only lets me scale down the cluster to 3 nodes.
My question: Is there a way to completely "shut down" the Cloud Composer cluster for part of the day? If anything, can I at least bring it own to a single node? Ideally, this would be an automated task.
Upvotes: 2
Views: 1925
Reputation: 871
Cloud composer also has costs you cannot do anything with
These costs are a significant part of a small composer cluster.
If you want to scale down to 0, I suggest running airflow on a VM instead of a managed composer environment. After the airflow has completed its run, you can shut down the VM to reduce costs.
GKE (that runs composer), cannot scale down to 0 nodes, as it also running some kubernetes services that needs cpu & ram to run on.
Other than that you should check out the link posted by SANN3, as that posts gives some detailed insight in how to achieve autoscaling.
Upvotes: 2
Reputation: 10099
The same problem is solved by traveloka team and written a detailed article about the process. But in the idle case they are running 1 node not zero.
https://medium.com/traveloka-engineering/enabling-autoscaling-in-google-cloud-composer-ac84d3ddd60
Upvotes: 3
Reputation: 2083
You can enable Autoscaling
in the Node level:
Workloads > your composer cluster name > enable Autoscaling
PROJECT=[provide your gcp project id]
COMPOSER_NAME=[provide your composer environment name]
COMPOSER_LOCATION=[provide the selected composer’s location e.g. us-central]
CLUSTER_ZONE=[provide the selected composer’s zone e.g. us-central1-a]
GKE_CLUSTER=$(gcloud composer environments describe \
${COMPOSER_NAME} \
--location ${COMPOSER_LOCATION} \
--format="value(config.gkeCluster)" \
--project ${PROJECT} | \
grep -o '[^\/]*$')
gcloud container clusters update ${GKE_CLUSTER} --enable-autoscaling \
--min-nodes 1 \
--max-nodes 10 \
--zone ${CLUSTER_ZONE} \
--node-pool=default-pool \
--project ${PROJECT}
For worker level, we are going to apply Kubernetes’ Horizontal Pod Autoscaler (HPA) to airflow-worker Deployment in Composer.
Upvotes: 1