Reputation: 3668
I have this below query which takes about 15-20 secs to run.
with cte0 as ( SELECT label, date, CASE WHEN Lead(label || date || "number") OVER (PARTITION BY label || date || "number" ORDER BY "label", "date", "number", "time") IS NULL THEN '1'::numeric ELSE '0'::numeric END As "unique" FROM table_data LEFT JOIN table_mapper ON table_mapper."type" = table_data."type" WHERE Date BETWEEN date_trunc('month', current_date - 1) and current_date - 1 ) SELECT 'MTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 UNION ALL SELECT 'Week' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('week', current_date - 1) AND current_date -1 UNION ALL SELECT 'FTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" = current_date -1
In the table table_data
I have a index on date
column.
CREATE INDEX ix_cli_date ON table_data USING btree (date);
\d table_data
)Table "public.table_data" Column | Type | Modifiers ------------------+------------------------+----------- date | date | not null number | bigint | not null time | time without time zone | not null end time | time without time zone | not null duration | integer | not null time1 | integer | not null time2 | integer | not null time3 | integer | not null time4 | integer | not null time5 | integer | not null time6 | integer | not null time7 | integer | not null type | text | not null name | text | not null id1 | integer | not null id2 | integer | not null key | integer | not null status | text | not null Indexes: "ix_cli_date" btree (date)
Table Definition (\d table_mapper
)
Table "public.table_mapper" Column | Type | Modifiers ------------+------+----------- type | text | not null label | text | not null label2 | text | not null label3 | text | not null label4 | text | not null label5 | text | not null
Result (cost=184342.66..230332.86 rows=3 width=64) (actual time=23377.923..25695.478 rows=3 loops=1)" CTE cte0" -> WindowAgg (cost=121516.06..156751.65 rows=612793 width=23) (actual time=14578.000..18985.958 rows=696157 loops=1)" -> Sort (cost=121516.06..123048.04 rows=612793 width=23) (actual time=14577.975..17084.405 rows=696157 loops=1)" Sort Key: (((table_mapper.label || (table_data.date)::text) || (table_data."number")::text)), table_mapper.label, table_data.date, table_data."number", table_data."time"" Sort Method: external merge Disk: 39480kB" -> Hash Left Join (cost=11.96..37474.21 rows=612793 width=23) (actual time=1.449..3308.718 rows=696157 loops=1)" Hash Cond: (table_data."type" = table_mapper."type")" -> Index Scan using ix_cli_date on table_data (cost=0.02..29036.36 rows=612793 width=38) (actual time=0.141..946.648 rows=696157 loops=1)" Index Cond: ((date >= date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)) AND (date Hash (cost=7.53..7.53 rows=353 width=25) (actual time=1.275..1.275 rows=336 loops=1)" Buckets: 1024 Batches: 1 Memory Usage: 15kB" -> Seq Scan on table_mapper (cost=0.00..7.53 rows=353 width=25) (actual time=0.020..0.589 rows=336 loops=1)" -> Append (cost=27591.00..73581.21 rows=3 width=64) (actual time=23377.920..25695.467 rows=3 loops=1)" -> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=23377.917..23377.918 rows=1 loops=1)" -> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=14578.052..22335.236 rows=696157 loops=1)" Filter: ((date = date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)))" -> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=1741.509..1741.510 rows=1 loops=1)" -> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=20.009..1522.352 rows=168261 loops=1)" Filter: ((date = date_trunc('week'::text, ((('now'::text)::date - 1))::timestamp with time zone)))" -> Aggregate (cost=18399.11..18399.13 rows=1 width=32) (actual time=576.029..576.030 rows=1 loops=1)" -> CTE Scan on cte0 (cost=0.00..18383.79 rows=3064 width=32) (actual time=9.308..546.735 rows=23486 loops=1)" Filter: (date = (('now'::text)::date - 1))" Total runtime: 25710.506 ms"
Description :
I'm taking the unique count and repeated count from the table_data
and this where LEAD
helped me out where I give the value 0 for the last repeated value of a column.
Suppose I have 3 x
in a column. I give 1
value to the first 2 x
and the 3rd x
is given 0.
Actually through a cte
I'm taking the entire rows from the table table_data
and doing some calculation using the lead and concatinating the strings for a defined date range where each row 1
and 0
value is defined as per the criteria.
If the lead is null it'll be counted as 1 and if it is not null then 0.
And the I return 3 rows MTD
, Current Week
and FTD
respectively with a calculation on taking the sum()
I got from the lead and the count(*)
entire rows.
For MTD I have the sum and count for the current month.
For Week - It's the current week and FTD is for yesterday.
Upvotes: 4
Views: 708
Reputation: 658907
WITH cte AS (
SELECT d.thedate
, lead(m.label) OVER (PARTITION BY m.label, d.thedate, d.number
ORDER BY d.thetime) AS leader
FROM table_data d
LEFT JOIN table_mapper m USING (type)
WHERE thedate BETWEEN date_trunc('month', current_date - 1)
AND current_date - 1
)
SELECT 'MTD' AS label, round(count(leader)::numeric / count(*) * 100, 1) AS val
FROM cte
UNION ALL
SELECT 'Week', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate BETWEEN date_trunc('week', current_date - 1) AND current_date - 1
UNION ALL
SELECT 'FTD', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate = current_date - 1;
The CTE makes sense for big tables, so you only scan it once. For smaller tables it may be faster without ...
Using thedate
instead of reserved word date
(in standard SQL).
thetime
, uni
instead of time
, unique
. Etc.
Simplified the lead()
call. You get a value or NULL for the leading row. That seems the be the only relevant information.
It's a pointless waste to repeat columns from the PARTITION
clause in the ORDER BY
clause of a window function.
Building on that, count(leader) / count(*)
instead of sum(uni) / count(uni)
is a bit faster. count(column)
only counts non-null values, while count(*)
counts all rows.
The condition for the first term of the UNION
query was redundant.
More advice and links about data definition in the comments to the question.
You should have primary keys. I suggest serial
or IDENTITY
column as surrogate PK for table_data
:
ALTER TABLE table_data ADD COLUMN table_data_id serial PRIMARY KEY;
See:
Make type
the primary key of table_mapper
(also needed for the following FK constraint):
ALTER TABLE table_mapper ADD CONSTRAINT table_mapper_pkey (type);
Add a foreign key constraint for type
to enforce referential integrity. Something like:
ALTER TABLE table_data ADD CONSTRAINT table_data_type_fkey
FOREIGN KEY (type) REFERENCES table_mapper (type)
ON UPDATE CASCADE ON DELETE NO ACTION;
For ultimate read performance (at some cost for writes), add a multi-column index to possibly allow index-only scans for above query:
CREATE INDEX table_data_foo_idx ON table_data (thedate, number, thetime);
Upvotes: 2
Reputation: 1271003
As your query is written, you are referring to the CTE three times. Instead, you can use conditional aggregation if you are willing to have the values in three columns rather than three rows:
SELECT round(sum("date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 then "unique" else 0 END)) /
sum("date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 then 1 else 0 END)) *100,1) as mtd
. . .
FROM CTE
This may speed up the query. In addition, you could then incorporate this logic into the CTE query itself, eliminating the materialization step as well.
Upvotes: 1