Andrew
Andrew

Reputation: 3

Can BigQuery drop subtotals rows WHEN responding to ROLLUP query

When I query BQ with ROLLUP by potentially large set of grouping criteria field (in this case campaign_group_id)

for example:

SELECT
  campaign_group_id AS campaign_group_id,
  DATE(DATE_ADD(TIME, 3, 'HOUR')) AS DAY,
  SUM(impressions) AS imps
FROM
  [browser_traffic.2016_05_28],
  [browser_traffic.2016_05_29]
WHERE
  ( DATE_ADD( TIME, 3, "HOUR") >= '2016-05-28 00:00:00'
    AND DATE_ADD( TIME, 3, "HOUR") < '2016-05-30 00:00:00' )
GROUP EACH BY ROLLUP (campaign_group_id, DAY)
ORDER BY DAY ASC, campaign_group_id ASC
LIMIT 500

BQ returns many rows with subtotals which is not applicable in my use case:

+-------------------+------+-----------+
| campaign_group_id | day  |   imps    |
+-------------------+------+-----------+
|              NULL | NULL | 158423933 |
|                61 | NULL |         0 |
|               496 | NULL |     79870 |
|               497 | NULL |     10492 |
|               809 | NULL |         0 |
|               936 | NULL |      2451 |
|               937 | NULL |         0 |
|               940 | NULL |      6844 |
|               942 | NULL |    207685 |
|               946 | NULL |         0 |
|               961 | NULL |         0 |
|               975 | NULL |     16167 |
|               976 | NULL |     15767 |
|              1018 | NULL |         0 |
|              1020 | NULL |         0 |
|              1022 | NULL |    766875 |
|              1039 | NULL |    355765 |
...

I need to somehow reduce subtotal rows from result but leave intact grand total row (which is first row in above result)

Is it possible that BQ returns only grand total row for selected fields?

Upvotes: 0

Views: 569

Answers (3)

Felipe Hoffa
Felipe Hoffa

Reputation: 59225

Notice that the function GROUPING() exists too, and will help you with this:

SELECT year, name, SUM(number) s,
       GROUPING(year) is_grouping_year, 
       GROUPING(name) is_grouping_name
FROM [bigquery-public-data:usa_names.usa_1910_2013] 
WHERE name IN ('John', 'Jovana')
AND year BETWEEN 2012 AND 2013
GROUP BY ROLLUP(name, year)
ORDER BY year, name

year    name    s       is_grouping_year    is_grouping_name     
null    null    21182   1                   1    
null    John    21164   1                   0    
null    Jovana  18      1                   0    
2012    John    10576   0                   0    
2012    Jovana  18      0                   0    
2013    John    10588   0                   0

Basically you are asking for the rows where is_grouping_year and is_grouping_name are either both 0 or 1.

From the docs:

When using the ROLLUP function, you can use the GROUPING function to distinguish between rows that were added because of the ROLLUP function and rows that actually have a NULL value for the group key.

https://cloud.google.com/bigquery/query-reference#groupby

Upvotes: 0

Michael Entin
Michael Entin

Reputation: 7744

You can filter the result of the query with another SELECT statement:

SELECT campaign_group_id, day, imps 
FROM (
    ... your rollup query with LIMIT removed ...
)
WHERE (day IS NOT NULL) OR (campaign_group_id IS NULL)
LIMIT 500

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

If you are interested in grand totals - you most likely do not need ROLLUP

Rather you can consider regular GROUP BY as below

SELECT
  DATE(DATE_ADD(TIME, 3, 'HOUR')) AS DAY,
  SUM(impressions) AS imps
FROM
  [browser_traffic.2016_05_28],
  [browser_traffic.2016_05_29]
WHERE
  ( DATE_ADD( TIME, 3, "HOUR") >= '2016-05-28 00:00:00'
    AND DATE_ADD( TIME, 3, "HOUR") < '2016-05-30 00:00:00' )
GROUP 1
ORDER BY DAY ASC
LIMIT 500

Upvotes: 0

Related Questions