Reputation: 1012
My use case is quite simple. A data provider puts CSV files on Google Storage every hour. Once a day, a Transfer job picks up those files and sends to a BigQuery table. The problem is that I need to rename those fields.
The first thing that came to me was simply using a scheduled query like
SELECT bad_field AS good_field (...) FROM raw_table
and schedule that query to append the results to a nicer table. I'm not sure if that's a very clean solution because I'm just starting out with BigQuery.
I have also read about DataFlow but I have no experience with that service.
Does anybody have had this sort of scenario? How did you deal with it?
Upvotes: 0
Views: 291
Reputation: 7277
You can schedule a local cron to do this job for you and load your data with fields named whatever you want into a BigQuery table. All you have to do is to specify a schema based on which your CSV files are to be parsed, and you will be good with the naming of your columns.
Let's say your CSV file has 3 columns, and you want them to be named X,Y,Z
. Then you can schedule this gcloud
command to do the job for you:
bq --location=US \
load \
--replace=false \
--source_format=CSV \
--skip_leading_rows=1 \
`yourdataset.tablename` \
gs://yourbucket/date/*.csv \
--schema=X:STRING,Y:FLOAT,Z:STRING
Please note the following:
--skip_leading_rows=1
parameter (which assumes that first row is header and it just skips it).--replace=false
parameter (which doesnt let the job to overwrite, but to append to the target table.--schema
parameter, which specifies the schema as a comma-separated list of KEY:FORMAT
.Hope it helps.
Upvotes: 2