Luiscri
Luiscri

Reputation: 1037

Create scheduled query with bq mk command-line tool from a SQL file

I am trying to create a scheduled query using the bq command-line tool storing the query to be executed on a sql file named scheduled_query.sql, but I am not able to have it working. I am checking this documentation.

I want to schedule the following query:

INSERT INTO github_actions_test_dataset.timestamp_to_unix
SELECT
    CURRENT_DATETIME("Europe/Madrid") AS timestamp, 
    UNIX_MILLIS(CURRENT_TIMESTAMP()) AS unix_miliseconds

For this, I have executed the following command with success:

bq mk --transfer_config --display_name='Example Scheduled Query' --target_dataset=github_actions_test_dataset [email protected] --data_source=scheduled_query --params='{"query":"INSERT INTO github_actions_test_dataset.timestamp_to_unix SELECT CURRENT_DATETIME(\"Europe/Madrid\") AS timestamp, UNIX_MILLIS(CURRENT_TIMESTAMP()) AS unix_miliseconds"}'

However, instead of having to write the query in the command I want to retrieve it from a .sql file, and feed it to the command. I tried using sed 's/"/\\"/g' scheduled_query.sql in order to escape de " character in my query like this:

bq mk --transfer_config --display_name='Example Scheduled Query' --target_dataset=$DATASET --service_account_name=github-actions-sa@my-project.iam.gserviceaccount.com --data_source=scheduled_query --params='{"query":"'$(sed 's/"/\\"/g' scheduled_query.sql)'"}'

But again received

Too many positional args, still have ['SELECT', 'CURRENT_DATETIME(\\"Europe/Madrid\\")', 'AS', 'timestamp,', 'UNIX_MILLIS(CURRENT_TIMESTAMP())', 'AS', 'unix_miliseconds"}']

The solution may be more related on how to concatenate quoted strings on a bash command. What am I doing wrong? Note that I want to use the bq mk command, not the bk query one.

Upvotes: 2

Views: 892

Answers (2)

Luiscri
Luiscri

Reputation: 1037

I finally managed to use the bq mk command after applying a couple transformations to the content of the file:

QUERY=$(tr -s '[:space:]' ' ' < scheduled_query.sql | sed 's/"/\\"/g')
PARAMS='{"query":"'$QUERY'"}'
bq mk --transfer_config --display_name="Example scheduled query" --location=EU --data_source=scheduled_query --schedule='every day 06:45' --params="$PARAMS"

I will explain them:

  • tr -s '[:space:]' ' ' < scheduled_query.sql : as the query needed to be inserted on a json-like object, the new line characters and tabs where causing a Too many positional args error. To solve it, I substituted any consecutive number of spacing characters with a blank. Most probably there is a better alternative that keeps the original format, but after several days of trying things this worked for me.
  • sed 's/"/\\"/g' : also, as we are providing the query inside a json we need to scape all the double quotes inside my query so that they don't mess up with the field delimiter ones.

With this I was able to load any query I have tried without problems. The only limitation I have found has been when the query contains a line comment. As we are getting rid of the new line characters, if a line comment appears on the query it will comment out all the lines which were after it, so you will have to manually remove them or apply any other rule that automatically finds and delete them.

BONUS:

I used this piece of code to create a scheduled query from Github Actions. If you came here for the same reason, you can have a look to the logic I implemented to see if it fits your use case:

QUERY_DISPLAYNAME="Example scheduled query"
QUERY=$(tr -s '[:space:]' ' ' < $GITHUB_WORKSPACE/queries/scheduled_query.sql | sed 's/"/\\"/g')
PARAMS='{"query":"'$QUERY'"}'
RESULT=$(bq ls --format=json --transfer_config --transfer_location=eu | jq -r '.[] | select(.displayName=="'"$QUERY_DISPLAYNAME"'")')
if ! [[ "$RESULT" ]];
then
    bq mk --transfer_config --display_name="$QUERY_DISPLAYNAME" --location=EU --service_account_name=${{ secrets.SA_NAME }} --data_source=scheduled_query --schedule='every day 06:45' --params="$PARAMS"
else
    QUERY_NAME=$(echo $RESULT | jq -r '.name')  
    bq update --transfer_config --params="$PARAMS" $QUERY_NAME
fi

Upvotes: 1

Daniel Zagales
Daniel Zagales

Reputation: 3034

Try the following approach:

QUERY="$(cat test_sql.sql)"

bq query \
--append_table=true \
--destination_table=elZagales:so_test.timestamp_to_unix \
--display_name='test run' \
--schedule='every 15 minutes' \
--use_legacy_sql=false \
$QUERY

Utilizing this approach you simply just pass it as a query that has been set as a shell variable.

This results in the creation of a scheduled query here: enter image description here

Upvotes: 1

Related Questions