Bob
Bob

Reputation: 375

BigQuery SQL job dependency on Dataflow pipeline

I have an apache beam pipeline in python that has a flow like below for whatever reason.

client = bigquery.Client()
query_job1 = client.query('create table sample_table_1 as select * from table_1')  
result1 = query_job.result()

with beam.Pipeline(options=options) as p:

    records = (
            p
            | 'Data pull' >> beam.io.Read(beam.io.BigQuerySource(...))
            | 'Transform' >> ....
            | 'Write to BQ' >> beam.io.WriteToBigQuery(...)
    )

query_job2 = client.query('create table sample_table_2 as select * from table_2')  
result2 = query_job2.result()

SQL Job --> Datapipeline --> SQL Job

This sequence works fine when I run this locally. However when I was trying to run this as a Dataflow pipeline, it doesn't really run it in this order.

Is there a way to force the dependencies while running on dataflow?

Upvotes: 2

Views: 289

Answers (1)

Alexandre Moraes
Alexandre Moraes

Reputation: 4051

As @PeterKim mentioned, the processing flow you described in the comment section is not possible to be achieved with only Dataflow. Currently, Dataflow programming model does not support it.

You can use Composer in order to orchestrate sequential job executions which depend one another, here.

Upvotes: 2

Related Questions