Reputation: 95
I have a query with a strange behavior because in certain occasions it takes 120 seconds and another 250 seconds. I have to minimize execution time but can not find a guide to help me to improve the runtime of my queries.
The table containing the information has the following characteristics:
The query is this:
SELECT
todayInfo.client AS Client,
todayInfo.todayInfo AS Today,
todayInfo.dayAgoInfo AS DayAgo,
todayInfo.threeDaysAgo AS ThreeDaysAgo,
todayInfo.weekAgo AS weekAgo,
FROM (
SELECT
client,
SUM(IF( bp_Time BETWEEN TIMESTAMP('2016/01/01')
AND TIMESTAMP('2016/01/31'),1,0)) AS todayInfo,
SUM(IF( bp_Time BETWEEN DATE_ADD(TIMESTAMP('2016/01/01'), - 1,"DAY")
AND DATE_ADD(TIMESTAMP('2016/01/31'), - 1,"DAY"),1,0)) AS dayAgoInfo,
SUM(IF( bp_Time BETWEEN DATE_ADD(TIMESTAMP('2016/01/01'), - 3,"DAY")
AND DATE_ADD(TIMESTAMP('2016/01/31'), - 3,"DAY"),1,0)) AS threeDaysAgo,
SUM(IF( bp_Time BETWEEN DATE_ADD(TIMESTAMP('2016/01/01'), - 8,"DAY")
AND DATE_ADD(TIMESTAMP('2016/01/31'), - 8,"DAY"),1,0)) AS weekAgo
FROM
[dataset.table]
GROUP BY
client
) AS todayInfo
ORDER BY
Today DESC
LIMIT 10
There is a guide or tips that can help me optimize runtimes not only this query but also future queries?
Upvotes: 2
Views: 1059
Reputation: 1004
In this link you can find more information on how BigQuery works behind the scenes. Understanding slots are key to optimizing queries. Slots are units of computational capacity and the data is shuffle between them depending on the joins, order by, group by, functions, subqueries and more.
In your case as you compute "SUM", "IF", "BETWEEN" in your query you are demanding more computation. Some functions that would be useful in your case is the "DATE_SUB" and the countif().
SELECT countif((DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))=end_date) as AS dayAgoInfo from [dataset.table];
Upvotes: 1
Reputation: 13954
Because of the way BigQuery works, it isn't as tunable as relational databases. It does not use indexes, and every query is a full scan.
That being said, you need to learn more about why your queries are slow. This depends a lot on the specifics of your data. You can use the query plan explanation to get a good idea of where BigQuery is spending its time.
But, as @Pentium10 mentioned in the comment, your probably going to have to partition your data by day to reduce the size of scans. It's explained in more detail in this answer.
Upvotes: 2