Ankit Atrey
Ankit Atrey

Reputation: 1

Performance of Looker Studio's Query in BigQuery

I have prepared a sample data of around 1.8GB in three seperate tables. I have then created a materialized view (which contains 7M records) to join the data of these 3 tables.

Using this materialized view, I have created a report in Looker Studio. The report without caching, initially takes around 18-20 seconds to load data. To create a worst case scenario, I have added a table in the report to show all the dimensions/columns present in the data.

The query job that Looker studio issues to prepare table is as follows:

SELECT t0.address, t0.agent_name, t0.amount, t0.customer_gender, t0.customer_id, t0.customer_name, t0.food_item, t0.phone_number, DATETIME_TRUNC(SAFE_CAST(t0.date AS DATETIME), SECOND) AS t0_qt_gjznucnwed, t0.rating, t0.zipCode FROM project_id.my_dataset.data_mat_view AS t0 GROUP BY t0.address, t0.agent_name, t0.amount, t0.customer_gender, t0.customer_id, t0.customer_name, t0.food_item, t0.phone_number, t0_qt_gjznucnwed, t0.rating, t0.zipCode ORDER BY t0.food_item DESC LIMIT 2000001;

Actually, I tried partitioning and clustering the data and materialized view, also used BI-engine with 3GB capacity but that does not show much improvement in the report's performance.

I am eager to know whether it is possible to load report in this scenario under 5 seconds. If YES, how we can we do it? If NO, is this the expected behavior? Will the performance remain the same with a large data size of 3TB?

Edit: I am using this query to join 3 tables:

SELECT t2.food_item as food_item, t2.amount as amount, t1.address as address,
t3.agent_name as agent_name , t1.gender as customer_gender, t1.customer_id as customer_id, t1.customer_name as customer_name, t1.date as date, t3.phone_number as phone_number, t1.zip_code as zipCode, t3.rating as rating FROM project_id.my_dataset.customers_data as t1 JOIN project_id.my_dataset.orders_data as t2 ON t1.customer_id = t2.customer_id JOIN project_id.my_dataset.agents_data as t3 ON t2.order_id = t3.order_id

Upvotes: 0

Views: 439

Answers (1)

Pentium10
Pentium10

Reputation: 207982

We have been on this road, the solution is dedicated TABLE for datastudio reports, it could be dedicated for chart as well.

  1. A scheduler that regularely replaces a dedicated table for DataStudio reports. Eg run every 5m / 15 m / 1h / 24h

  2. this is acieved with the syntax of

    CREATE OR REPLACE TABLE datastudio.table1 AS  
    .....  
    select query1;  
    .....  
    CREATE OR REPLACE ........;  

we have many of these small tables optimized for datastudio

also you can cluster these table based on the elements that you use as filters, date, dimension1, d2, d3 etc...

  1. BI Engine works flaweless as this table is not changing. If you opt for 15min schedule, the cache will invalidate only after that period. Our BI engine is 4GB in size.

  2. Disable timetravel on these datasets.

  3. Create multiple tables if different clustering needs to be achieved, with the additional tables integration workload.

Upvotes: 1

Related Questions