Reputation: 1463
I am working with following sample data;
dt | ship_id | audit_id | action
2022-01-02 | 1351 | id1 | destroy
2022-01-01 | 1351 | id1 | create
2021-12-12 | 3457 | id2 | create
2021-12-16 | 3457 | id2 | destroy
2021-12-28 | 3457 | id3 | create
To give some context, for a given ship_id
, and audit_id
; an entry has to be created before it is destroyed as defined by action
column. For example, for ship_id=3457, and audit_id=id2; got created on Dec 12 and destroyed on Dec 16.
The goal is to get, for every dt
(when action is created), how many audit_ids are created before it , and how many audit_ids are destroyed before it.
Sample output:
dt | created_cnt | destroyed_cnt
2022-01-01 | 2 | 1
Possible Approach Using self join idea.
select
audit_id,
ship_id,
max(case when action = 'create' then dt end) as creation_time,
max(case when action = 'destroy' then dt end) as removal_time
from table
group by 1,2)
select
t1.creation_time as creation_date,
count(t2.audit_id) as created_cnt,
count(distinct case when t2.removal_time < t1.creation_time then t2.audit_id end) as
destroyed_cnt
from cte as t1
left join cte as t2 on t1.creation_time > t2.creation_time
group by 1
order by 1 desc;
But due to large table, this self-join is slowing things down. Is it possible to use some sort of window functions here to replace joining? Help is appreciated.
Upvotes: 4
Views: 1696
Reputation: 25968
jiggle Filipe's answer the sum(iff(action='create',1,0))
can be swapped for count_if(action='create')
thus become:
with data as (
select $1 dt, $2 ship, $3 audit, $4 action
from values('2022-01-02', 1, 'id1', 'destroy')
,('2022-01-01', 1, 'id1', 'create')
,('2021-12-12', 2, 'id2', 'create')
,('2020-12-16', 2, 'id2', 'destroy')
,('2020-12-28', 2, 'id3', 'create')
)
select dt
,count_if(action='create') over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
,count_if(action='destroy') over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data
Upvotes: 1
Reputation: 1804
An alternative answer using PIVOT instead of IFF(). Would be interested to hear which approach scales best for your problem.
Code (Copy|Paste|Run):
with data as (
select $1 dt, $2 ship, $3 audit, $4 action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)
select
dt
, sum($3) over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum($4) over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from
data pivot ( count (audit) for action in ('create','destroy'));
Upvotes: 3
Reputation: 59175
Check this solution with over(order by dt rows between unbounded preceding and 1 preceding)
:
with data as (
select $1 dt, $2 ship, $3 audit, $4 action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)
select dt
, sum(iff(action='create',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum(iff(action='destroy',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data
Upvotes: 4