beta
beta

Reputation: 2560

Collecting per-day results for queries over date-partitioned tables

I have some tables with names such as counts_20171220 in BigQuery with rows in this format (there is only one row per contentId per date):

| contentId | views |
+-----------+-------+
| cb32edc0  | 728324|
| 52cbb1ff  | 643220|
...

I want to generate a list of the view count for a given contentId for a given time range, without gaps, e.g:

|       date |  views |
+------------+--------+
| 2017-12-01 |   NULL | -- or 0
| 2017-12-02 |   NULL | -- or 0
| 2017-12-03 | 728314 |
| 2017-12-04 | 328774 |
| 2017-12-05 |  28242 |
...
| 2017-12-20 |   NULL | -- or 0

In order to do this, I guess I need to use * and _table_suffix, but I'm not able to figure out how to include the dates for which there is no entry for that contentId. The closest I have got is this query:

#standardSQL
SELECT
  _table_suffix AS date,
  ARRAY_AGG(views) AS views
FROM
  `test.counts_*`
WHERE
  _table_suffix BETWEEN '20171201' AND '20171220'
  AND contentId = 'cb32edc0'
GROUP BY
  _table_suffix,
  contentId
ORDER BY
  date

The problem with this query is

  1. it does not include rows for all dates, only for dates whose table had an entry for contentId 'cb32edc0'
  2. due to the way it's structured I need to use a (useless here) aggregate function to extract views

How should I structure a query like this? I'd be happy to get both specific help for this query and general pointers on how to achieve things like this where one queries over date-partitioned tables.

Upvotes: 1

Views: 56

Answers (1)

Elliott Brossard
Elliott Brossard

Reputation: 33765

This should work, assuming that there is at least one row for each date:

SELECT
  PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS date,
  IFNULL(MAX(IF(contentId = 'cb32edc0', views, NULL)), 0) AS views
FROM `test_counts_*`
WHERE _TABLE_SUFFIX BETWEEN '20171201' AND '20171220'
GROUP BY date;

Instead of explicitly filtering the rows that don't match the desired contentId, it uses a condition with an aggregate function to exclude them from the result. If there are no rows in the group with the desired contentId, the IFNULL ensures that the expression returns 0 rather than NULL.

Upvotes: 1

Related Questions