Urvah Shabbir
Urvah Shabbir

Reputation: 985

Bigquery - Version Control Scheduled Queries

Right now I have scheduled queries via BQ interface. They work but do not scale or migrate very well (across dev and prod gcp projects). So I am trying to do scheduled queries in a way that is reproducible, scalable and migrate-able.

My queries are complicated and hence I am struggling with ', " and ''' to make it run via bq commands and schedule via github actions.

This is the query which is most complicated:

declare bq_last_id int64;
declare external_sql string;
set bq_last_id = (select max(id) from bq_dataset.bq_table);
set external_sql = '"select * from mysql_table where id > ('|| bq_last_id ||')"';
execute immediate 'select * from external_query("my-gcp-project.my-region.my-connection-name",'|| external_sql || ');'

In total there are 20 something queries that have to scheduled. This one is the only one which is incremental or other are drop and recreate table again so they are not as complicated as this one.

WHAT I HAVE TRIED TILL NOW:

Any help with correct syntax or better suggestions to do this will be super helpful to me.

Thanks.

Upvotes: 1

Views: 1263

Answers (1)

ewertonvsilva
ewertonvsilva

Reputation: 1945

The way I use to solve problems with scape characters when inserting queries in bq commands is using jq on shell and my queries on a file, as following:

  1. create queries.sql file with your query script:
cat queries.sql
declare bq_last_id int64;
declare external_sql string;
set bq_last_id = (select max(id) from bq_dataset.bq_table);
set external_sql = '"select * from mysql_table where id > ('|| bq_last_id ||')"';
execute immediate 'select * from external_query("my-gcp-project.my-region.my-connection-name",'|| external_sql || ');'
  1. Create the following script: schedule_query.sh
#!/bin/bash
set -f #avoind * used as wildcard
json=$(jq -nc --arg query "$(<queries.sql)" '{ "query": $query }')

#adapt the command and param to work in your environment (destination, tables, etc...)
bq mk \
--transfer_config \
--target_dataset=mydataset \
--display_name='My Scheduled Query' \
--params="$json" \
--data_source=scheduled_query \
--service_account_name=abcdef-test-sa@abcdef-test.iam.gserviceaccount.com

Upvotes: 1

Related Questions