Cristian Arboleda
Cristian Arboleda

Reputation: 95

Query Tuning Bigquery

I have a query with a strange behavior because in certain occasions it takes 120 seconds and another 250 seconds. I have to minimize execution time but can not find a guide to help me to improve the runtime of my queries.

The table containing the information has the following characteristics:

  1. All fields are repeat
  2. It has 800 million records
  3. Process 14.2 GB

The query is this:

SELECT
 todayInfo.client AS Client,
  todayInfo.todayInfo AS Today,
  todayInfo.dayAgoInfo AS DayAgo,
  todayInfo.threeDaysAgo AS ThreeDaysAgo,
  todayInfo.weekAgo AS weekAgo,
FROM (
  SELECT
    client,
    SUM(IF( bp_Time BETWEEN TIMESTAMP('2016/01/01')
    AND TIMESTAMP('2016/01/31'),1,0)) AS todayInfo,
    SUM(IF( bp_Time BETWEEN DATE_ADD(TIMESTAMP('2016/01/01'), - 1,"DAY")
    AND DATE_ADD(TIMESTAMP('2016/01/31'), - 1,"DAY"),1,0)) AS dayAgoInfo,
    SUM(IF( bp_Time BETWEEN DATE_ADD(TIMESTAMP('2016/01/01'), - 3,"DAY")
    AND DATE_ADD(TIMESTAMP('2016/01/31'), - 3,"DAY"),1,0)) AS threeDaysAgo,
    SUM(IF( bp_Time BETWEEN DATE_ADD(TIMESTAMP('2016/01/01'), - 8,"DAY")
    AND DATE_ADD(TIMESTAMP('2016/01/31'), - 8,"DAY"),1,0)) AS weekAgo
  FROM
    [dataset.table]
   GROUP BY
    client   
  ) AS todayInfo

ORDER BY
    Today DESC
LIMIT 10

There is a guide or tips that can help me optimize runtimes not only this query but also future queries?

Upvotes: 2

Views: 1059

Answers (2)

Nathan Nasser
Nathan Nasser

Reputation: 1004

In this link you can find more information on how BigQuery works behind the scenes. Understanding slots are key to optimizing queries. Slots are units of computational capacity and the data is shuffle between them depending on the joins, order by, group by, functions, subqueries and more.

In your case as you compute "SUM", "IF", "BETWEEN" in your query you are demanding more computation. Some functions that would be useful in your case is the "DATE_SUB" and the countif().

SELECT countif((DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))=end_date) as AS dayAgoInfo from [dataset.table];

Upvotes: 1

mimming
mimming

Reputation: 13954

Because of the way BigQuery works, it isn't as tunable as relational databases. It does not use indexes, and every query is a full scan.

That being said, you need to learn more about why your queries are slow. This depends a lot on the specifics of your data. You can use the query plan explanation to get a good idea of where BigQuery is spending its time.

But, as @Pentium10 mentioned in the comment, your probably going to have to partition your data by day to reduce the size of scans. It's explained in more detail in this answer.

Upvotes: 2

Related Questions