Power Trader
Power Trader

Reputation: 19

Colab Enterprise Notebook Scheduler vs GCP Functions & Scheduler

I am trying to build data ingestion pipelines (ETL) using Google Cloud Platform I have python scripts that downloads public data, uploads it to cloud storage, and performs transformation on this data and uploads it to BigQuery These scripts have to be run on a schedule ( hourly & daily) We are considering two options to achieve this goal:

Option 1:

Option 2:

Which of these two options is better overall? Is there a comparison of cost, reliability & efficiency between the two methods?

Have tried both methods to build data ingestion pipelines and they work as expected

Upvotes: 0

Views: 169

Answers (1)

guillaume blaquiere
guillaume blaquiere

Reputation: 75940

I have a better proposal:

  • Package your code in a container
  • Deploy your container on Cloud Run Jobs
  • Use Cloud Scheduler to invoke the Cloud Run Jobs

Here, some explanation on that proposal:

  • A container is today a universal way to package your code. Here I propose to run it on Cloud Run Jobs, but you could also deploy your container on a K8S cluster (GKE for instance), on a compute engine, on any other service that accepts containers.
  • Cloud Run Job is perfectly designed to run efficiently at scale, in serverless mode. It's the next gen of Cloud Functions (Cloud Functions gen2 backend is Cloud Run!); Jobs can run up to 7 days!
  • Cloud Scheduler is the perfect service for scheduling. Invoke the Cloud Run Jobs execution API, with OAuth security and a service account with the Cloud Run Invoker role.

Upvotes: 1

Related Questions