Reputation: 2560
I have some tables with names such as counts_20171220
in BigQuery with rows in this format (there is only one row per contentId
per date):
| contentId | views |
+-----------+-------+
| cb32edc0 | 728324|
| 52cbb1ff | 643220|
...
I want to generate a list of the view count for a given contentId
for a given time range, without gaps, e.g:
| date | views |
+------------+--------+
| 2017-12-01 | NULL | -- or 0
| 2017-12-02 | NULL | -- or 0
| 2017-12-03 | 728314 |
| 2017-12-04 | 328774 |
| 2017-12-05 | 28242 |
...
| 2017-12-20 | NULL | -- or 0
In order to do this, I guess I need to use *
and _table_suffix
, but I'm not able to figure out how to include the dates for which there is no entry for that contentId
. The closest I have got is this query:
#standardSQL
SELECT
_table_suffix AS date,
ARRAY_AGG(views) AS views
FROM
`test.counts_*`
WHERE
_table_suffix BETWEEN '20171201' AND '20171220'
AND contentId = 'cb32edc0'
GROUP BY
_table_suffix,
contentId
ORDER BY
date
The problem with this query is
contentId
'cb32edc0'views
How should I structure a query like this? I'd be happy to get both specific help for this query and general pointers on how to achieve things like this where one queries over date-partitioned tables.
Upvotes: 1
Views: 56
Reputation: 33765
This should work, assuming that there is at least one row for each date:
SELECT
PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS date,
IFNULL(MAX(IF(contentId = 'cb32edc0', views, NULL)), 0) AS views
FROM `test_counts_*`
WHERE _TABLE_SUFFIX BETWEEN '20171201' AND '20171220'
GROUP BY date;
Instead of explicitly filtering the rows that don't match the desired contentId
, it uses a condition with an aggregate function to exclude them from the result. If there are no rows in the group with the desired contentId
, the IFNULL
ensures that the expression returns 0
rather than NULL
.
Upvotes: 1