Nicky Feller
Nicky Feller

Reputation: 3899

Run continuous python script on GCP

I am pulling weather data from an API. I wrote a script that fetches the data every 15 minutes. What is the best practice for running this scripton google's app engine?

Upvotes: 1

Views: 659

Answers (2)

Ani
Ani

Reputation: 1457

Assuming you don't want to rewrite your script in another language (e.g. JavaScript, that would allow Cloud Functions or Google Apps Script), the question is what you actually want to do with the fetched data and if you already use an App Engine app or a VM.

You can use an App Engine app in Python standard environment for just this feature. Basically you would write a request handler that will fetch the data and configure cron.yaml to schedule a cron-job. As a result, your request handler will receive an HTTP request according to your schedule and then performs an Outbound Request with fetch(). See the doc for limitations (e.g. port restrictions). For this setup I also suggest to configure the task-queue so that only one request is handled at any time and also add an (exponential?) back-off in case a request fails. Also keep in mind, that the default idle_timeout before an instance is shutdown is 5 minutes (for "basic scaling"). 15 minutes is the startup fee that is billed for a new instance. Since cron-jobs do not exactly run on a per second base but are slightly distributed around the scheduled time, this might lead to additional costs depending on your configuration. So it might make sense to either increase idle_timeout in a basic-scaling configuration to 16 or 17 minutes, or to schedule your task every 13.5 minutes or so.

If the fetch() restrictions doesn't meet your requirements you might want to consider either a flexible environment or a VM.

Upvotes: 2

Burke9077
Burke9077

Reputation: 382

I have done exactly what you're asking here in the past, pull weather data (likely from a .gov source) and then do some processing to it and store it in a database.

I started using a python/cron combo but had issues tracking down what part of it failed when it failed. There were many times where the data that should have been available was not.

In my case I was in AWS so I used Lambda, but Google Cloud Platform's Cloud Functions is similar. I kicked individual functions off with Jenkins using their scheduled triggers then tracked their completion to ensure it completed successfully. If the function fails then I can see which specific part of the process failed easily in Jenkins.

Upvotes: 0

Related Questions