Reputation: 3
When I query BQ with ROLLUP by potentially large set of grouping criteria field (in this case campaign_group_id
)
for example:
SELECT
campaign_group_id AS campaign_group_id,
DATE(DATE_ADD(TIME, 3, 'HOUR')) AS DAY,
SUM(impressions) AS imps
FROM
[browser_traffic.2016_05_28],
[browser_traffic.2016_05_29]
WHERE
( DATE_ADD( TIME, 3, "HOUR") >= '2016-05-28 00:00:00'
AND DATE_ADD( TIME, 3, "HOUR") < '2016-05-30 00:00:00' )
GROUP EACH BY ROLLUP (campaign_group_id, DAY)
ORDER BY DAY ASC, campaign_group_id ASC
LIMIT 500
BQ returns many rows with subtotals which is not applicable in my use case:
+-------------------+------+-----------+
| campaign_group_id | day | imps |
+-------------------+------+-----------+
| NULL | NULL | 158423933 |
| 61 | NULL | 0 |
| 496 | NULL | 79870 |
| 497 | NULL | 10492 |
| 809 | NULL | 0 |
| 936 | NULL | 2451 |
| 937 | NULL | 0 |
| 940 | NULL | 6844 |
| 942 | NULL | 207685 |
| 946 | NULL | 0 |
| 961 | NULL | 0 |
| 975 | NULL | 16167 |
| 976 | NULL | 15767 |
| 1018 | NULL | 0 |
| 1020 | NULL | 0 |
| 1022 | NULL | 766875 |
| 1039 | NULL | 355765 |
...
I need to somehow reduce subtotal rows from result but leave intact grand total row (which is first row in above result)
Is it possible that BQ returns only grand total row for selected fields?
Upvotes: 0
Views: 569
Reputation: 59225
Notice that the function GROUPING() exists too, and will help you with this:
SELECT year, name, SUM(number) s,
GROUPING(year) is_grouping_year,
GROUPING(name) is_grouping_name
FROM [bigquery-public-data:usa_names.usa_1910_2013]
WHERE name IN ('John', 'Jovana')
AND year BETWEEN 2012 AND 2013
GROUP BY ROLLUP(name, year)
ORDER BY year, name
year name s is_grouping_year is_grouping_name
null null 21182 1 1
null John 21164 1 0
null Jovana 18 1 0
2012 John 10576 0 0
2012 Jovana 18 0 0
2013 John 10588 0 0
Basically you are asking for the rows where is_grouping_year
and is_grouping_name
are either both 0 or 1.
From the docs:
When using the ROLLUP function, you can use the GROUPING function to distinguish between rows that were added because of the ROLLUP function and rows that actually have a NULL value for the group key.
https://cloud.google.com/bigquery/query-reference#groupby
Upvotes: 0
Reputation: 7744
You can filter the result of the query with another SELECT statement:
SELECT campaign_group_id, day, imps
FROM (
... your rollup query with LIMIT removed ...
)
WHERE (day IS NOT NULL) OR (campaign_group_id IS NULL)
LIMIT 500
Upvotes: 1
Reputation: 173046
If you are interested in grand totals - you most likely do not need ROLLUP
Rather you can consider regular GROUP BY as below
SELECT
DATE(DATE_ADD(TIME, 3, 'HOUR')) AS DAY,
SUM(impressions) AS imps
FROM
[browser_traffic.2016_05_28],
[browser_traffic.2016_05_29]
WHERE
( DATE_ADD( TIME, 3, "HOUR") >= '2016-05-28 00:00:00'
AND DATE_ADD( TIME, 3, "HOUR") < '2016-05-30 00:00:00' )
GROUP 1
ORDER BY DAY ASC
LIMIT 500
Upvotes: 0