Reputation: 315

logging all BigQuery queries

Is it possible to have all BigQuery requests logged to a file in Cloud Storage (or even better into a BigQuery table)? It seems like the --apilog option available in bq is intended mainly for debugging purposes, but what I'd like to do is keep track of all queries, just like logging all access requests on a particular file in CloudStorage.

To be more specific, I don't just want to log my own queries, but (a) queries by all users within the same project, and optimally also (b) queries by anyone touching a table in a dataset that I own.

Upvotes: 12

Answers (4)

scythe

Reputation: 71

There's a better way to do this now with the INFORMATION_SCHEMA tables.

Here's a simple way to get all queries from a project in the last 90 days:

 SELECT
   job_id,
   start_time,
   user_email,
   total_bytes_processed,
   query
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY) 
 AND CURRENT_TIMESTAMP()
 AND job_type = "QUERY"
 AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY) AND CURRENT_TIMESTAMP()
ORDER BY total_bytes_processed DESC

Full documentation can be found here: https://cloud.google.com/bigquery/docs/information-schema-jobs

Upvotes: 4

Chris Madden

Reputation: 2650

BigQuery has the INFORMATION_SCHEMA.JOBS_BY_* view to retrieve real-time metadata about BigQuery jobs. This view contains currently running jobs, as well as the last 180 days of history of completed jobs.

For more info see Getting jobs metadata using INFORMATION_SCHEMA

Upvotes: 1

bunny

Reputation: 354

I know its late, but GCP in its latest releases introduced this new feature of Audit logs.

Refer this - Audit Logs BQ

Upvotes: 5

Pentium10

Reputation: 207838

In the CLI, you can run bq ls -j -a to retrieve jobs for all users in a project. You can redirect all output to a storage file.

Then you can run for each job id a bq show -j <job_id> and in order to have more details you will choose to use the json response:

bq show --format=prettyjson -j job_joQEqPwOiOoBlOhDBEgKxQAlKJQ

this returns the following format which have your query, your user and bytesprocessed etc...

{
  "configuration": {
    "dryRun": false, 
    "query": {
      "createDisposition": "CREATE_IF_NEEDED", 
      "destinationTable": {
        "datasetId": "", 
        "projectId": "", 
        "tableId": ""
      }, 
      "query": "", 
      "writeDisposition": "WRITE_TRUNCATE"
    }
  }, 
  "etag": "", 
  "id": "", 
  "jobReference": {
    "jobId": "", 
    "projectId": ""
  }, 
  "kind": "bigquery#job", 
  "selfLink": "", 
  "statistics": {
    "creationTime": "1435006022346", 
    "endTime": "1435006144730", 
    "query": {
      "cacheHit": false, 
      "totalBytesProcessed": "105922683030"
    }, 
    "startTime": "1435006023171", 
    "totalBytesProcessed": "105922683030"
  }, 
  "status": {
    "state": "DONE"
  }, 
  "user_email": ""
}

Using the API you need to pass allUsers property to list jobs from all users https://cloud.google.com/bigquery/docs/reference/v2/jobs/list#allUsers

Upvotes: 6

logging all BigQuery queries

Answers (4)

Related Questions