Can I load compressed jsonl data from GCS to BigQuery and add an additional date column using DataFlow

Question

Piggybacking off of this post, I want to create a beam dataflow job to load data from GCS to Bigquery. There are thousands of files within the GCS bucket, all of which are pretty massive and are compressed JSONL data. The data format makes it impossible to create a partitioned table using a date field, so I would like to add my own during the pipeline.

Is it possible to add a manual field to the pipeline, separate from the compressed data, so that when I load the data from GCS to BigQuery it appears in the end BigQuery table? I would like to be able to do this without having to unzip any of the files or performing a sequential SELECT operation on the table itself.

Can I load compressed jsonl data from GCS to BigQuery and add an additional date column using DataFlow

Answers (1)

Related Questions