MadRed
MadRed

Reputation: 71

Beam Dataflow Pipeline Table Creation Sink as Bigquery from GCS

I want to create beam dataflow job to load data from GCS to Bigquery, I will have 100s of files from different folders in GCS in Parquet format, is it possible to load files from different folders in GCS and is it possible to create source dataset and tables in the beam code itself.

My end goal is to create pipeline to load data from GCS to Bigquery thanks in advance.

Upvotes: 0

Views: 199

Answers (2)

Vibhor Gupta
Vibhor Gupta

Reputation: 699

An alternate solution, You can use gsutil to move all files from different GCS folders to one single folder. Then once you have all files in a single folder over GCS then you can easily Read data from GCS and Load it to BigQuery.

Upvotes: 0

Kenn Knowles
Kenn Knowles

Reputation: 6023

Yes, this is a perfect fit for Dataflow. You can use FileIO to read from GCS and BigQueryIO to write to BigQuery.

Upvotes: 0

Related Questions