Divan Vermeulen
Divan Vermeulen

Reputation: 1

How to join columns from one table to another in Bigquery using Apache beam (Python) for a dataflow

I have two tables stored in Bigquery, and want to join the columns from the one table to another table. This needs to be done using Apache Beam (Python) for a dataflow pipeline in Google cloud platform. Just cannot find an approach to do this with Apache Beam. WriteToBigQuery only appends rows, which is not what I need - need to add columns from another table. Both tables uses the same primary keys. Any help will be appraciated.

FEEDBACK: See responses below from Guillaume. This solved my problem and were a better approach as apposed to using Apache beam and dataflow!

Upvotes: 0

Views: 571

Answers (1)

Vibhor Gupta
Vibhor Gupta

Reputation: 699

You can try following snippet, to read data from Bigquery over Dataflow and join 2 tables and write data to a new Bigquery table:-

    data_loading = (
        p1
        | 'ReadBQ' >> beam.io.Read(beam.io.BigQuerySource(query='''SELECT a.Coll1, b.Coll2 FROM `PROJ.dataset.table-a` as a, `PROJ.dataset.table-b` as b WHERE a.coll-join=b.coll-join; ''', use_standard_sql=True))
    )

Upvotes: 0

Related Questions