Reputation: 315
Is it possible to have all BigQuery requests logged to a file in Cloud Storage (or even better into a BigQuery table)? It seems like the --apilog option available in bq is intended mainly for debugging purposes, but what I'd like to do is keep track of all queries, just like logging all access requests on a particular file in CloudStorage.
To be more specific, I don't just want to log my own queries, but (a) queries by all users within the same project, and optimally also (b) queries by anyone touching a table in a dataset that I own.
Upvotes: 12
Views: 14975
Reputation: 71
There's a better way to do this now with the INFORMATION_SCHEMA tables.
Here's a simple way to get all queries from a project in the last 90 days:
SELECT
job_id,
start_time,
user_email,
total_bytes_processed,
query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
AND CURRENT_TIMESTAMP()
AND job_type = "QUERY"
AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY) AND CURRENT_TIMESTAMP()
ORDER BY total_bytes_processed DESC
Full documentation can be found here: https://cloud.google.com/bigquery/docs/information-schema-jobs
Upvotes: 4
Reputation: 2650
BigQuery has the INFORMATION_SCHEMA.JOBS_BY_*
view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, as well as the last 180 days of history of completed jobs.
For more info see Getting jobs metadata using INFORMATION_SCHEMA
Upvotes: 1
Reputation: 354
I know its late, but GCP in its latest releases introduced this new feature of Audit logs.
Refer this - Audit Logs BQ
Upvotes: 5
Reputation: 207838
In the CLI, you can run bq ls -j -a
to retrieve jobs for all users in a project. You can redirect all output to a storage file.
Then you can run for each job id a bq show -j <job_id>
and in order to have more details you will choose to use the json response:
bq show --format=prettyjson -j job_joQEqPwOiOoBlOhDBEgKxQAlKJQ
this returns the following format which have your query, your user and bytesprocessed etc...
{
"configuration": {
"dryRun": false,
"query": {
"createDisposition": "CREATE_IF_NEEDED",
"destinationTable": {
"datasetId": "",
"projectId": "",
"tableId": ""
},
"query": "",
"writeDisposition": "WRITE_TRUNCATE"
}
},
"etag": "",
"id": "",
"jobReference": {
"jobId": "",
"projectId": ""
},
"kind": "bigquery#job",
"selfLink": "",
"statistics": {
"creationTime": "1435006022346",
"endTime": "1435006144730",
"query": {
"cacheHit": false,
"totalBytesProcessed": "105922683030"
},
"startTime": "1435006023171",
"totalBytesProcessed": "105922683030"
},
"status": {
"state": "DONE"
},
"user_email": ""
}
Using the API you need to pass allUsers property to list jobs from all users https://cloud.google.com/bigquery/docs/reference/v2/jobs/list#allUsers
Upvotes: 6