Reputation: 1037
I am trying to create a scheduled query using the bq
command-line tool storing the query to be executed on a sql file named scheduled_query.sql
, but I am not able to have it working. I am checking this documentation.
I want to schedule the following query:
INSERT INTO github_actions_test_dataset.timestamp_to_unix
SELECT
CURRENT_DATETIME("Europe/Madrid") AS timestamp,
UNIX_MILLIS(CURRENT_TIMESTAMP()) AS unix_miliseconds
For this, I have executed the following command with success:
bq mk --transfer_config --display_name='Example Scheduled Query' --target_dataset=github_actions_test_dataset [email protected] --data_source=scheduled_query --params='{"query":"INSERT INTO github_actions_test_dataset.timestamp_to_unix SELECT CURRENT_DATETIME(\"Europe/Madrid\") AS timestamp, UNIX_MILLIS(CURRENT_TIMESTAMP()) AS unix_miliseconds"}'
However, instead of having to write the query in the command I want to retrieve it from a .sql
file, and feed it to the command. I tried using sed 's/"/\\"/g' scheduled_query.sql
in order to escape de "
character in my query like this:
bq mk --transfer_config --display_name='Example Scheduled Query' --target_dataset=$DATASET --service_account_name=github-actions-sa@my-project.iam.gserviceaccount.com --data_source=scheduled_query --params='{"query":"'$(sed 's/"/\\"/g' scheduled_query.sql)'"}'
But again received
Too many positional args, still have ['SELECT', 'CURRENT_DATETIME(\\"Europe/Madrid\\")', 'AS', 'timestamp,', 'UNIX_MILLIS(CURRENT_TIMESTAMP())', 'AS', 'unix_miliseconds"}']
The solution may be more related on how to concatenate quoted strings on a bash command. What am I doing wrong? Note that I want to use the bq mk
command, not the bk query
one.
Upvotes: 2
Views: 892
Reputation: 1037
I finally managed to use the bq mk
command after applying a couple transformations to the content of the file:
QUERY=$(tr -s '[:space:]' ' ' < scheduled_query.sql | sed 's/"/\\"/g')
PARAMS='{"query":"'$QUERY'"}'
bq mk --transfer_config --display_name="Example scheduled query" --location=EU --data_source=scheduled_query --schedule='every day 06:45' --params="$PARAMS"
I will explain them:
tr -s '[:space:]' ' ' < scheduled_query.sql
: as the query needed to be inserted on a json-like object, the new line characters and tabs where causing a Too many positional args
error. To solve it, I substituted any consecutive number of spacing characters with a blank. Most probably there is a better alternative that keeps the original format, but after several days of trying things this worked for me.sed 's/"/\\"/g'
: also, as we are providing the query inside a json we need to scape all the double quotes inside my query so that they don't mess up with the field delimiter ones.With this I was able to load any query I have tried without problems. The only limitation I have found has been when the query contains a line comment. As we are getting rid of the new line characters, if a line comment appears on the query it will comment out all the lines which were after it, so you will have to manually remove them or apply any other rule that automatically finds and delete them.
BONUS:
I used this piece of code to create a scheduled query from Github Actions
. If you came here for the same reason, you can have a look to the logic I implemented to see if it fits your use case:
QUERY_DISPLAYNAME="Example scheduled query"
QUERY=$(tr -s '[:space:]' ' ' < $GITHUB_WORKSPACE/queries/scheduled_query.sql | sed 's/"/\\"/g')
PARAMS='{"query":"'$QUERY'"}'
RESULT=$(bq ls --format=json --transfer_config --transfer_location=eu | jq -r '.[] | select(.displayName=="'"$QUERY_DISPLAYNAME"'")')
if ! [[ "$RESULT" ]];
then
bq mk --transfer_config --display_name="$QUERY_DISPLAYNAME" --location=EU --service_account_name=${{ secrets.SA_NAME }} --data_source=scheduled_query --schedule='every day 06:45' --params="$PARAMS"
else
QUERY_NAME=$(echo $RESULT | jq -r '.name')
bq update --transfer_config --params="$PARAMS" $QUERY_NAME
fi
Upvotes: 1
Reputation: 3034
Try the following approach:
QUERY="$(cat test_sql.sql)"
bq query \
--append_table=true \
--destination_table=elZagales:so_test.timestamp_to_unix \
--display_name='test run' \
--schedule='every 15 minutes' \
--use_legacy_sql=false \
$QUERY
Utilizing this approach you simply just pass it as a query that has been set as a shell variable.
This results in the creation of a scheduled query here:
Upvotes: 1