Reputation: 1
I have prepared a sample data of around 1.8GB in three seperate tables. I have then created a materialized view (which contains 7M records) to join the data of these 3 tables.
Using this materialized view, I have created a report in Looker Studio. The report without caching, initially takes around 18-20 seconds to load data. To create a worst case scenario, I have added a table in the report to show all the dimensions/columns present in the data.
The query job that Looker studio issues to prepare table is as follows:
SELECT t0.address, t0.agent_name, t0.amount, t0.customer_gender, t0.customer_id, t0.customer_name, t0.food_item, t0.phone_number, DATETIME_TRUNC(SAFE_CAST(t0.date AS DATETIME), SECOND) AS t0_qt_gjznucnwed, t0.rating, t0.zipCode FROM project_id.my_dataset.data_mat_view
AS t0 GROUP BY t0.address, t0.agent_name, t0.amount, t0.customer_gender, t0.customer_id, t0.customer_name, t0.food_item, t0.phone_number, t0_qt_gjznucnwed, t0.rating, t0.zipCode ORDER BY t0.food_item DESC LIMIT 2000001;
Actually, I tried partitioning and clustering the data and materialized view, also used BI-engine with 3GB capacity but that does not show much improvement in the report's performance.
I am eager to know whether it is possible to load report in this scenario under 5 seconds. If YES, how we can we do it? If NO, is this the expected behavior? Will the performance remain the same with a large data size of 3TB?
Edit: I am using this query to join 3 tables:
SELECT
t2.food_item as food_item,
t2.amount as amount,
t1.address as address,
t3.agent_name as agent_name ,
t1.gender as customer_gender,
t1.customer_id as customer_id,
t1.customer_name as customer_name,
t1.date as date,
t3.phone_number as phone_number,
t1.zip_code as zipCode,
t3.rating as rating
FROM
project_id.my_dataset.customers_data
as t1
JOIN
project_id.my_dataset.orders_data
as t2
ON
t1.customer_id = t2.customer_id
JOIN
project_id.my_dataset.agents_data
as t3
ON
t2.order_id = t3.order_id
Upvotes: 0
Views: 439
Reputation: 207982
We have been on this road, the solution is dedicated TABLE for datastudio reports, it could be dedicated for chart as well.
A scheduler that regularely replaces a dedicated table for DataStudio reports. Eg run every 5m / 15 m / 1h / 24h
this is acieved with the syntax of
CREATE OR REPLACE TABLE datastudio.table1 AS
.....
select query1;
.....
CREATE OR REPLACE ........;
we have many of these small tables optimized for datastudio
also you can cluster these table based on the elements that you use as filters, date, dimension1, d2, d3 etc...
BI Engine works flaweless as this table is not changing. If you opt for 15min schedule, the cache will invalidate only after that period. Our BI engine is 4GB in size.
Disable timetravel on these datasets.
Create multiple tables if different clustering needs to be achieved, with the additional tables integration workload.
Upvotes: 1