Reputation: 516
I want to calculate the monthly average of some data using SQL query where the data resides in redshift DB. The data is present in the following format in the table.
s_date | sales
------------+-------
2020-08-04 | 10
2020-08-05 | 20
---- | --
---- | --
The data may not be present for all the date in a month. If the data is not present for a day, it should be considered as 0.
Following query using AVG() function "group by" month as gives the average of based on the data on available date.
select trunc(date_trunc('MONTH', s_date)::timestamp) as month, avg(sales) from sales group by month;
However it does not consider the data for missing dates as 0. What should be the right query to calculate the monthly average as expected?
One more expectation is that, for the current month, the average should be calculated based on the data till today. So it should not consider entire month (like 30 or 31 days).
Regards,
Paul
Upvotes: 2
Views: 1787
Reputation: 521639
Using a calendar table might be the easiest way to go here:
WITH dates AS (
SELECT date_trunc('day', t)::date AS dt
FROM generate_series('2020-01-01'::timestamp, '2020-12-31'::timestamp, '1 day'::interval) t
),
cte AS (
SELECT t.dt, COALESCE(SUM(s.sales), 0) AS sales
FROM dates t
LEFT JOIN sales s ON t.dt = s.s_date
GROUP BY t.dt
)
SELECT
LEFT(dt::text, 7) AS ym,
AVG(sales) AS avg_sales
FROM cte
GROUP BY
LEFT(dt::text, 7);
The logic here is to first generate an intermediate table in the second CTE which has one record for each data in your data set, along with the total sales for that date. Then, we aggregate by year/month, and report the average sales.
Upvotes: 2