Reputation: 163
Notebooks in the Google Cloud Platform has been great for Python development in the cloud, but the last missing piece is just running existing notebooks on a schedule. There's a million different tools (Airflow, Papermill, Google Cloud Jobs, Google Cloud Scheduler, Google Cloud Cron Jobs), and as someone not as familiar with Cloud, it's really easy to get lost. Any suggestion? Thanks guys!
Upvotes: 5
Views: 3321
Reputation: 31
this schedule option just works for managed notebooks.Unfortunately not for user managed notebooks.
Upvotes: 3
Reputation: 12808
If you create a managed notebook on GCP, you can now schedule a workbook execution within the notebook environment itself.
Create a Jupyter environment with Managed Notebooks:
See also: Schedule managed notebook quickstart
Upvotes: 2
Reputation: 1464
The 2 main options seem to be to either manually configure Jupyter Notebooks to run on a schedule, or to let Cloud Composer do the heavy lifting.
Regarding the manual route, you can manipulate Jupyter Notebook to run on a schedule, plugin for scheduling files for recurring execution, schedule recurring Python script (converted from Jupyter) on GCP, Cloud Scheduler to turn cronjob, using Cloud Function & Pub/Sub & Cloud Scheduler, see this Stack Overflow thread for “How to run a Python notebook daily automatically”.
While using Cloud Composer offers a less manual approach, and is more scalable if need be, refer to this Stack Overflow thread for more information.
To execute a specific notebook you can use Papermill and point D2 here is a very extensive article for scheduled execution using Papermill. Check out this Google Cloud blog for an example and more information.
There is a “Jupyter Notebook Manifesto: Best practices that can improve the life of any developer using Jupyter notebooks” blog post by Google Cloud that explains the product in depth and can be found here.
Upvotes: 2
Reputation: 2116
This post on Medium, "How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform", describes how to run Jupyter notebook jobs on a Compute Engine Instance and schedule it using GCP's Cloud Scheduler > Cloud Pub/Sub > Cloud Functions.
If you want to use Cloud Composer, then you might find this answer to related question, "ETL in Airflow aided by Jupyter Notebooks and Papermill," useful.
Upvotes: 4