Reputation: 107
We collect data about the software tools used across our team. One table we use to represent tool use across our team simply counts the number of times that tool was launched per day across the team:
SELECT
day
, num_launches
, tool_name
FROM
dataset.by_tool_by_day
ORDER BY day ASC
;
Gives you:
| day | num_launches | tool_name |
|-------------|--------------|-----------|
| 2019-12-20 | 1 | Tool A |
| 2019-12-20 | 11 | Tool B |
| 2019-12-20 | 30 | Tool C |
| 2019-12-21 | 14 | Tool B |
| 2019-12-22 | 19 | Tool C |
| 2019-12-23 | 7 | Tool A |
| 2019-12-23 | 4 | Tool B |
| 2019-12-23 | 17 | Tool C |
The problem is that, if there are no tool launches on a particular day, then there is no record in the table representing 0 tool launches that day. In the example above, there were no Tool A launches on 12/21/2019. The absence of a 0 launches record corrupts the visualization of this data (e.g., in a bar chart), since days with 0 launches are not represented.
I cannot come up with a query that uses a calendar table to generate "0 launches" for each tool in the by_tool_by_day table. It is easy to do if all of the records in the table are for a single tool. But I cannot do it if there are multiple tools, and hence multiple date series, that need to be "complete."
Upvotes: 1
Views: 71
Reputation: 4085
I tried to load the same data that you showed here on BigQuery
As you can see below, at first we have to create a date array for the days you want to fill. You can change the date range for the time you need.
Then we create an auxiliary temporary table by performing a cross join between the dates we generated and all the distinct tool names.
Finally, we can select the day, the tool name and the number of launches (using IFNULL for treating specific cases where there are no launches for such tool in such date) We perform a LEFT JOIN from the auxiliary table to keep all the dates and tool names.
WITH
date_range AS (
SELECT
dates
FROM
UNNEST(GENERATE_DATE_ARRAY(DATE '2019-12-01', DATE '2019-12-31', INTERVAL 1 DAY)) AS dates ),
aux AS(
SELECT
date_range.dates AS day,
tool_name,
FROM
date_range
CROSS JOIN (
SELECT
DISTINCT tool_name
FROM
`deploy.tools`) t )
SELECT
aux.day,
aux.tool_name,
IFNULL(t.num_launches,
0) AS num_launches
FROM
aux
LEFT JOIN
`deploy.tools` t
ON
aux.day = t.day
AND aux.tool_name = t.tool_name
ORDER BY
aux.day,
aux.tool_name
Upvotes: 0
Reputation: 1808
#standardSQL
WITH
DATES AS (
SELECT
gen_date
FROM
UNNEST(GENERATE_DATE_ARRAY(DATE '2019-12-01', DATE '2019-12-23', INTERVAL 1 DAY)) AS gen_date
),
TOOLS AS (
SELECT 'ToolA' tool_name UNION ALL
SELECT 'ToolB' tool_name UNION ALL
SELECT 'ToolC' tool_name
),
AGG AS (
SELECT DATE '2019-12-20' day, 1 num_launches, 'ToolA' tool_name UNION ALL
SELECT DATE '2019-12-20', 11, 'ToolB' UNION ALL
SELECT DATE '2019-12-20', 30, 'ToolC' UNION ALL
SELECT DATE '2019-12-21', 14, 'ToolB' UNION ALL
SELECT DATE '2019-12-22', 19, 'ToolC' UNION ALL
SELECT DATE '2019-12-23', 7, 'ToolA' UNION ALL
SELECT DATE '2019-12-23', 4, 'ToolB' UNION ALL
SELECT DATE '2019-12-23', 17, 'ToolC'
)
SELECT
D.gen_date AS day,
T.tool_name,
IFNULL(A.num_launches, 0) AS num_launches
FROM
DATES D
CROSS JOIN
TOOLS T
LEFT JOIN
AGG A
ON
T.tool_name = A.tool_name
AND D.gen_date=A.day
ORDER BY
D.gen_date
Let me know if it works for you?
Upvotes: 2