APB
APB

Reputation: 45

Schedulling for Cloud Dataflow Job

So, I already finish to create a job in Dataflow. This job to process ETL from PostgreSQL to BigQuery. So, I don't know to create a schedulling using Airflow. Can share how to schedule job dataflow using Airflow?

Thank you

Upvotes: 0

Views: 600

Answers (2)

Mazlum Tosun
Mazlum Tosun

Reputation: 6582

In your Airflow DAG, you can define a cron and a scheduling with schedule_interval param :

with airflow.DAG(
        my_dag,
        default_args=args,
        schedule_interval="5 3 * * *"

    # Trigger Dataflow job with an operator
    launch_dataflow_job = BeamRunPythonPipelineOperator(
        runner='DataflowRunner',
        py_file=python_main_file,
        task_id='launch_dataflow_job',
        pipeline_options=dataflow_job_options,
        py_system_site_packages=False,
        py_interpreter='python3',
        dataflow_config=DataflowConfiguration(
            location='region'
        )
    )

    launch_dataflow_job
    ......

Upvotes: 1

Rathish Kumar B
Rathish Kumar B

Reputation: 1422

You can schedule dataflow batch jobs using Cloud Scheduler (fully managed cron job scheduler) / Cloud Composer (fully managed workflow orchestration service built on Apache Airflow).

To schedule using Cloud Scheduler refer Schedule Dataflow batch jobs with Cloud Scheduler

To schedule using Cloud Composer refer Launching Dataflow pipelines with Cloud Composer using DataflowTemplateOperator.

For examples and more ways to run Dataflow jobs in Airflow using Java/Python SDKs refer Google Cloud Dataflow Operators

Upvotes: 1

Related Questions