Calculate the monthly average including the date where data is missing

Question

I want to calculate the monthly average of some data using SQL query where the data resides in redshift DB. The data is present in the following format in the table.

   s_date   | sales 
------------+-------
 2020-08-04 |    10
 2020-08-05 |    20
   ----     |    --
   ----     |    --

The data may not be present for all the date in a month. If the data is not present for a day, it should be considered as 0.
Following query using AVG() function "group by" month as gives the average of based on the data on available date.

select trunc(date_trunc('MONTH', s_date)::timestamp) as month, avg(sales) from sales group by month;

However it does not consider the data for missing dates as 0. What should be the right query to calculate the monthly average as expected?

One more expectation is that, for the current month, the average should be calculated based on the data till today. So it should not consider entire month (like 30 or 31 days).

Regards,
Paul

Tim Biegeleisen · Accepted Answer

Using a calendar table might be the easiest way to go here:

WITH dates AS (
    SELECT date_trunc('day', t)::date AS dt
    FROM generate_series('2020-01-01'::timestamp, '2020-12-31'::timestamp, '1 day'::interval) t
),
cte AS (
    SELECT t.dt, COALESCE(SUM(s.sales), 0) AS sales
    FROM dates t
    LEFT JOIN sales s ON t.dt = s.s_date
    GROUP BY t.dt
)

SELECT
    LEFT(dt::text, 7) AS ym,
    AVG(sales) AS avg_sales
FROM cte
GROUP BY
    LEFT(dt::text, 7);

The logic here is to first generate an intermediate table in the second CTE which has one record for each data in your data set, along with the total sales for that date. Then, we aggregate by year/month, and report the average sales.

Calculate the monthly average including the date where data is missing

Answers (1)

Related Questions