Reputation: 4429
I have a query which get data summarised between two dates like so:
SELECT date(created_at),
COUNT(COALESCE(id, 0)) AS total_orders,
SUM(COALESCE(total_price, 0)) AS total_price,
SUM(COALESCE(taxes, 0)) AS taxes,
SUM(COALESCE(shipping, 0)) AS shipping,
AVG(COALESCE(total_price, 0)) AS average_order_value,
SUM(COALESCE(total_discount, 0)) AS total_discount,
SUM(total_price - COALESCE(taxes, 0) - COALESCE(shipping, 0) - COALESCE(total_discount, 0)) as net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
order by created_at::date desc
However for dates that do not have any orders, the query returns nothing and I'd like to return 0
.
I have tried with COALESCE
but that doesn't seem to do the trick?
Any suggestions?
Upvotes: 1
Views: 1289
Reputation: 656301
This should be substantially faster - and correct:
SELECT *
, total_price - taxes - shipping - total_discount AS net_sales -- ⑤
FROM (
SELECT created_at
, COALESCE(total_orders , 0) AS total_orders
, COALESCE(total_price , 0) AS total_price
, COALESCE(taxes , 0) AS taxes
, COALESCE(shipping , 0) AS shipping
, COALESCE(average_order_value , 0) AS average_order_value
, COALESCE(total_discount , 0) AS total_discount
FROM generate_series(timestamp '2022-07-20' -- ①
, timestamp '2022-07-26'
, interval '1 day') AS g(created_at)
LEFT JOIN ( -- ③
SELECT created_at::date
, count(*) AS total_orders -- ⑥
, sum(total_price) AS total_price
, sum(taxes) AS taxes
, sum(shipping) AS shipping
, avg(total_price) AS average_order_value
, sum(total_discount) AS total_discount
FROM orders
WHERE shop_id = 43
AND active -- simpler
AND created_at >= '2022-07-20'
AND created_at < '2022-07-27' -- ② !
GROUP BY 1
) o USING (created_at) -- ④
) sub
ORDER BY created_at DESC;
db<>fiddle here
I copied, simplified, and extended Xu's fiddle for comparison.
① Why this particular form for generate_series()
? See:
② Assuming created_at
is data type timestamp
your original formulation is most probably incorrect. created_at <= '2022-07-26'
would include the first instant of '2022-07-26' and exclude the rest. To include all of '2022-07-26', use created_at < '2022-07-27'
. See:
③ The LEFT JOIN
is the core feature of this answer. Generate all days with generate_series()
, independently aggregate days from table orders
, then LEFT JOIN
to retain one row per day like you requested.
④ I made the column name match created_at
, so we can conveniently shorten the join syntax with the USING
clause.
⑤ Compute net_sales
in an outer SELECT
after replacing NULL values, so we need COALESCE()
only once.
⑥ count(*)
is equivalent to COUNT(COALESCE(id, 0))
in any case, but cheaper. See:
Upvotes: 2
Reputation: 142
Please refer to the below script.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT created_at
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
There is one sample for your reference. Sample
I got your duplicate test cases at my side. The root cause is created_at field (datattype:timestamp), hence there are duplicate lines.
Below script is correct for your request.
SELECT *
FROM
(SELECT date(created_at) AS created_at,
COUNT(id) AS total_orders,
SUM(total_price) AS total_price,
SUM(taxes) AS taxes,
SUM(shipping) AS shipping,
AVG(total_price) AS average_order_value,
SUM(total_discount) AS total_discount,
SUM(total_price - taxes - shipping - total_discount) AS net_sales
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26'
GROUP BY date (created_at)
UNION
SELECT dates AS created_at,
0 AS total_orders,
0 AS total_price,
0 AS taxes,
0 AS shipping,
0 AS average_order_value,
0 AS total_discount,
0 AS net_sales
FROM generate_series('2022-07-20', '2022-07-26', interval '1 day') AS dates
WHERE dates NOT IN
(SELECT date (created_at)
FROM orders
WHERE shop_id = 43
AND orders.active = true
AND orders.created_at >= '2022-07-20'
AND orders.created_at <= '2022-07-26' ) ) a
ORDER BY created_at::date desc;
Here is a sample that's same with your side. Link
Upvotes: 1
Reputation: 430
You can use WITH RECURSIVE to build a table of dates and then select dates that are not in your table
WITH RECURSIVE t(d) AS (
(SELECT '2015-01-01'::date)
UNION ALL
(SELECT d + 1 FROM t WHERE d + 1 <= '2015-01-10')
) SELECT d FROM t WHERE d NOT IN (SELECT d_date FROM tbl);
[look on this post : ][1]
Upvotes: 0