Reputation: 344
I'm trying to find out how much it costs ( in BigQuery) when a user queries a Dashboard in Looker. We are having issues where we are consistently going over budget, due to users querying large amounts of data in Dashboards, and would like to get a handle on this.
I'm not after a solution to reduce my costs (yet) - but a way to quantify the costs associated to each Dashboard.
Now, when you setup a Look and apply your filters, Looker will let you know how much running that will cost. You see the following next to the Run button.
However, once that Look is added to a Dashboard you lose that functionality. If I had 10 of the same Tiles on my Dashboard I could ballpark it at 10.7GB. However, if Tiles are built on Looks which query different amounts of data and your Dashboard has date filters etc it can become very difficult.
It does not seem like Looker as any built in reporting for this. All they seem to be able to do is report on runtime, not amount of GBs queried. (Ideally you would like to be able to report on GBs queried by user - but I digress)
As a work around I currently have a table in BigQuery, which contains all queries which are run, by user ID, as well as the actual query, GBs queried and cost. I'm therefore able to see all the queries which Looker performs, the size of them and their cost. (I have this in Looker Studio) The issue is that a Dashboard (example above) may perform 10 queries (1 for each Tile) each time it runs. Currently I'm having to inspect each query to see if I can work out what dashboard it is from. This is not very practical. I want to do this at scale and assign each query to a specific dashboard - so I can report on the total cost each Dashboard is running up each month.
I thought of using a CASE statement to look for some unique part of the query - however in a lot of cases Tiles use similar queries in them. I wondered if it was possible to somehow insert a comment into the query so that I could search for that - perhaps in a custom filter which was always true.
Unfortunately comments seem to be stripped out in the SQL so they would not appear in the query in BigQuery.
That's what I am trying to achieve and what I have tried so far. If you have any thoughts on how I might achieve this or think I am perhaps looking at the wrong kind of solution, please let me know.
Upvotes: 0
Views: 1774
Reputation: 131
Here's how you can do it in several steps:
History
explore to get data on queries including the Dashboard ID or Title they are linked to. This query may do the job : https://your-instance.looker.com/explore/system__activity/history?fields=query.id,query.created_time,query.view,query.model,look.id,look.title,dashboard.id,dashboard.title,user.email,user.name,query.formatted_fields,query.filters,query.formatted_pivots,query.dynamic_fields&f[query.created_time]=24+hours (replace https://your-instance.looker.com by your instance's URL). Save that explore as a Look, query it on a daily basis and save the results for instance on a GCS and load that data onto BigQuery.query_id
like 1,123,123. You'll want to link that id to a BigQuery Job. Use the Query Performance Metrics
native Looker explore in order to do so. This query may do the job : https://your-instance.looker.com/explore/system__activity/query_metrics?fields=query_metrics.bigquery_job_id,query.id&f[query_metrics.bigquery_job_id]=-NULL&f[query.created_time]=24+hours (replace https://your-instance.looker.com by your instance's URL). Save that explore as a Look, query it on a daily basis and save the results for instance on a GCS and load that data onto BigQuery.INFORMATION_SCHEMA.JOBS_BY_PROJECT
table and the two above mentioned tables. This should enable you to get the average and total amount of bytes billed for each Dashboard. If you know how much you pay per TB, you can easily create a dimension on that view or simply a column in your derived table to get the information.Upvotes: 4