How to run a scheduled containerized job and minimize GCP costs

I want to run every 20 minutes a containerized python job using GCP. The job itself takes ~10 minutes to complete.

Currently I am running it as a CronJob in a zonal GKE cluster with 1 worker node.

Is there any other best practice? I want to know the best practice that also minimizes costs.

My requirements are:

I have seen there are other technologies such as Cloud Scheduler, Cloud Pub/Sub, Cloud Functions but don't know if they can be used for my requirements.

Also, I have seen that preemptible VMs exist on GKE with 80% gain in cost. My job can be stopped and restarted. It just runs a single transaction where it reads some data from a DB, does some preprocessing and at the end writes the result back in DB.

The only concern is this: When a node is preempted by Google how long it takes to come back? Some minutes? An hour? Also, when the GKE cluster has no nodes it will automatically create one?

Thank you

Upvotes: 1

Views: 1988

Answers (2)

guillaume blaquiere
guillaume blaquiere

Reputation: 75745

The preempted instances goes back very quickly (1 minutes or less) in ideal condition, I mean, there is enough resources in the zone to create your instance, but it can take hours or days if the zone is over used. Google priorises the regular instance at the expense of the preemptible.

In summary, you haven't any guarantee, but you pay only 20% of regular price! Always a matter of tradeoff.


If you have a container, you can shift gear and have a look to AI Platform custom container training. Forgive the meaning of the name, keep in mind that you can provision resources on demand to run your container and they are destroyed when the workload ends. You have a small overhead (about 2 minutes to start the container and 1 minutes at the end to destroy the environment) but I'm sure that can fit your use case

Upvotes: 1

dishant makwana
dishant makwana

Reputation: 1079

Run your cluster in GKE Autopilot mode. With GKE Autopilot, you only have to pay for time time for which your app is consuming resources. It is the perfect use case for a CRON job.

  • You don't need to make any changes in your code or deployment strategy. Just create a new GKE cluster and select the cluster type as autopilot.

Upvotes: 4

Related Questions