landslide
landslide

Reputation: 107

Generating a complete date series when multiple date series must be represented in a single table

We collect data about the software tools used across our team. One table we use to represent tool use across our team simply counts the number of times that tool was launched per day across the team:

SELECT
  day 
  , num_launches
  , tool_name
FROM
  dataset.by_tool_by_day
ORDER BY day ASC
 ;

Gives you:

| day         | num_launches | tool_name |
|-------------|--------------|-----------|
| 2019-12-20  | 1            | Tool A    |
| 2019-12-20  | 11           | Tool B    |
| 2019-12-20  | 30           | Tool C    |
| 2019-12-21  | 14           | Tool B    |
| 2019-12-22  | 19           | Tool C    |
| 2019-12-23  | 7            | Tool A    |
| 2019-12-23  | 4            | Tool B    |
| 2019-12-23  | 17           | Tool C    |

The problem is that, if there are no tool launches on a particular day, then there is no record in the table representing 0 tool launches that day. In the example above, there were no Tool A launches on 12/21/2019. The absence of a 0 launches record corrupts the visualization of this data (e.g., in a bar chart), since days with 0 launches are not represented.

I cannot come up with a query that uses a calendar table to generate "0 launches" for each tool in the by_tool_by_day table. It is easy to do if all of the records in the table are for a single tool. But I cannot do it if there are multiple tools, and hence multiple date series, that need to be "complete."

Upvotes: 1

Views: 71

Answers (2)

rmesteves
rmesteves

Reputation: 4085

I tried to load the same data that you showed here on BigQuery

As you can see below, at first we have to create a date array for the days you want to fill. You can change the date range for the time you need.

Then we create an auxiliary temporary table by performing a cross join between the dates we generated and all the distinct tool names.

Finally, we can select the day, the tool name and the number of launches (using IFNULL for treating specific cases where there are no launches for such tool in such date) We perform a LEFT JOIN from the auxiliary table to keep all the dates and tool names.

WITH
date_range AS (
  SELECT
    dates
  FROM
    UNNEST(GENERATE_DATE_ARRAY(DATE '2019-12-01', DATE '2019-12-31', INTERVAL 1 DAY)) AS dates ),
aux AS(
  SELECT
    date_range.dates AS day,
    tool_name,
  FROM
    date_range
  CROSS JOIN (
    SELECT
      DISTINCT tool_name
    FROM
      `deploy.tools`) t )
SELECT
  aux.day,
  aux.tool_name,
  IFNULL(t.num_launches,
    0) AS num_launches
FROM
  aux
LEFT JOIN
  `deploy.tools` t
ON
  aux.day = t.day
  AND aux.tool_name = t.tool_name
ORDER BY
  aux.day,
  aux.tool_name

Upvotes: 0

fpopic
fpopic

Reputation: 1808

#standardSQL

WITH 

DATES AS (
  SELECT 
    gen_date 
  FROM 
    UNNEST(GENERATE_DATE_ARRAY(DATE '2019-12-01', DATE '2019-12-23', INTERVAL 1 DAY)) AS gen_date
),

TOOLS AS (
  SELECT 'ToolA' tool_name UNION ALL
  SELECT 'ToolB' tool_name UNION ALL
  SELECT 'ToolC' tool_name 
),

AGG AS (
  SELECT DATE '2019-12-20' day,  1 num_launches, 'ToolA' tool_name UNION ALL
  SELECT DATE '2019-12-20', 11, 'ToolB' UNION ALL
  SELECT DATE '2019-12-20', 30, 'ToolC' UNION ALL
  SELECT DATE '2019-12-21', 14, 'ToolB' UNION ALL
  SELECT DATE '2019-12-22', 19, 'ToolC' UNION ALL
  SELECT DATE '2019-12-23',  7, 'ToolA' UNION ALL
  SELECT DATE '2019-12-23',  4, 'ToolB' UNION ALL
  SELECT DATE '2019-12-23', 17, 'ToolC'
)

SELECT
  D.gen_date AS day,
  T.tool_name,
  IFNULL(A.num_launches, 0) AS num_launches
FROM
  DATES D
CROSS JOIN 
  TOOLS T
LEFT JOIN
  AGG A
ON
  T.tool_name = A.tool_name 
  AND D.gen_date=A.day
ORDER BY
  D.gen_date

Let me know if it works for you?

Upvotes: 2

Related Questions