Reputation: 93
I have a need to (programatically) analyse the details of Scheduled Queries in BigQuery (eg which tables are updated and which tables accessed in the SQL). I have done something similar for the BQ Tables/Views using Apps Script BigQuery.Tables.list()
, but I cannot find an API to access the Scheduled Queries.
The UI is able to list them, so I feel this should be possible programmatically, e.g. via a REST API. Does anyone know if this is possible, what interface is supported (Apps Script, REST ...), and possibly an example of how to use it.
Upvotes: 9
Views: 10027
Reputation: 525
The above answers are excellent responses for using the REST API. For completeness, I'd like to include the CLI commands approach for solving the same thing. Personally I find this better suited for shell scripts but YMMV.
Example: List of Schedule Queries from Default Project.
bq ls --transfer_config --transfer_location=US --format=prettyjson
Example: Details of a Schedule Query from Default Project.
bq show --format=prettyjson --transfer_config [RESOURCE_NAME]
# RESOURCE_NAME is a value you can get from the above bq ls command.
Further details can be found here.
Upvotes: 5
Reputation: 141
Here is a shell script that will notify you in Slack if any of your scheduled queries fail. Simply integrate into your existing workflow (as blocking or non blocking) or have it on a separate Cron job. I have used httpie to send my HTTP post but you can also use curl or others. Also, you can also change your HTTP post for any other action.
See link for potential states of transfer state object
#1/bin/bash
sudo apt-get install httpie
location=US
authToken="$(gcloud auth print-access-token)"
projectId=$(gcloud config get-value project 2>\dev\null)
scheduled_queries=$(curl -H "Authorization: Bearer $authToken" https://bigquerydatatransfer.googleapis.com/v1/projects/$projectId/locations/$location/transferConfigs?dataSourceIds=scheduled_query)
# pretty print results
echo $scheduled_queries | python -m json.tool
length=$(echo "$scheduled_queries" | grep FAILED)
if [ $length -gt 0 ]; then
echo A SCHEDULED TRANSFER HAS FAILED
http POST https://hooks.slack.com/services/<your slack channel> text="A SCHEDULED TRANSFER HAS FAILED: HERE IS STDOUT >>> $scheduled_queries"
else
echo No errors in scheduled transfers
fi
If running outside the cloud or on something that hasn't already authenticated, you will need to authenticate before hand. To do this use before the above:
echo $PROD_KEY >> temp_json.json
bash gcloud auth activate-service-account --key-file=temp_json.json --project=$PROJ_NAME_PROD
where $PROD_KEY is the service key for whatever you are trying to access.
Upvotes: 0
Reputation: 7058
Scheduled queries are part of BigQuery's Data Transfer Service so you have to use its API. In particular, the projects.transferConfigs.list
method. Fill in the dataSourceIds
field with scheduled_query
and parent
with projects/PROJECT_ID
. As discussed in the comments, if you are using a regional location such as europe-west2 instead of a multi-regional one (EU or US) you should use projects.locations.transferConfigs.list
instead. Now, parent resource will be in the form of projects/PROJECT_ID/locations/REGIONAL_LOCATION
.
In addition, for other transfers you can get the corresponding dataSourceIds
using the projects.dataSources.list
method. That's how I got the scheduled_query
one.
Response will be an array of scheduled queries such as:
{
"name": "projects/<PROJECT_NUMBER>/locations/us/transferConfigs/<TRANSFER_CONFIG_ID>",
"destinationDatasetId": "<DATASET>",
"displayName": "hacker-news",
"updateTime": "2018-11-14T15:39:18.897911Z",
"dataSourceId": "scheduled_query",
"schedule": "every 24 hours",
"nextRunTime": "2019-04-19T15:39:00Z",
"params": {
"write_disposition": "WRITE_APPEND",
"query": "SELECT @run_time AS time,\n title,\n author,\n text\nFROM `bigquery-public-data.hacker_news.stories`\nLIMIT\n 1000",
"destination_table_name_template": "hacker_daily_news"
},
"state": "SUCCEEDED",
"userId": "<USER_ID>",
"datasetRegion": "us"
}
Example of an API call with bash and curl
:
#!/bin/bash
# parameter(s)
location=europe-west2
authToken="$(gcloud auth print-access-token)"
projectId=$(gcloud config get-value project 2>\dev\null)
# API call
scheduled_queries=$(curl -H "Authorization: Bearer $authToken" \
https://bigquerydatatransfer.googleapis.com/v1/projects/$projectId/locations/$location/transferConfigs?dataSourceIds=scheduled_query)
# pretty print results
echo $scheduled_queries | python -m json.tool
Upvotes: 7