Reputation: 1
I have two tables stored in Bigquery, and want to join the columns from the one table to another table. This needs to be done using Apache Beam (Python) for a dataflow pipeline in Google cloud platform. Just cannot find an approach to do this with Apache Beam. WriteToBigQuery only appends rows, which is not what I need - need to add columns from another table. Both tables uses the same primary keys. Any help will be appraciated.
FEEDBACK: See responses below from Guillaume. This solved my problem and were a better approach as apposed to using Apache beam and dataflow!
Upvotes: 0
Views: 571
Reputation: 699
You can try following snippet, to read data from Bigquery over Dataflow and join 2 tables and write data to a new Bigquery table:-
data_loading = (
p1
| 'ReadBQ' >> beam.io.Read(beam.io.BigQuerySource(query='''SELECT a.Coll1, b.Coll2 FROM `PROJ.dataset.table-a` as a, `PROJ.dataset.table-b` as b WHERE a.coll-join=b.coll-join; ''', use_standard_sql=True))
)
Upvotes: 0