jay
jay

Reputation: 1463

Replacing self joins by window functions

I am working with following sample data;

       dt   |   ship_id     |   audit_id   | action

 2022-01-02 |     1351      |     id1      | destroy
 2022-01-01 |     1351      |     id1      | create
 2021-12-12 |     3457      |     id2      | create
 2021-12-16 |     3457      |     id2      | destroy
 2021-12-28 |     3457      |     id3      | create
 

To give some context, for a given ship_id, and audit_id; an entry has to be created before it is destroyed as defined by action column. For example, for ship_id=3457, and audit_id=id2; got created on Dec 12 and destroyed on Dec 16.

The goal is to get, for every dt(when action is created), how many audit_ids are created before it , and how many audit_ids are destroyed before it.

Sample output:

       dt  |  created_cnt  |  destroyed_cnt

2022-01-01 |     2         |    1

Possible Approach Using self join idea.

 select
  audit_id,
  ship_id,
  max(case when action = 'create' then dt end) as creation_time,
  max(case when action = 'destroy' then dt end) as removal_time

from table 
group by 1,2)

select
 t1.creation_time as creation_date,
 count(t2.audit_id) as created_cnt,
 count(distinct case when  t2.removal_time <  t1.creation_time then t2.audit_id end) as 
    destroyed_cnt

from cte as t1
left join cte as t2 on  t1.creation_time > t2.creation_time
group by 1 
order by 1 desc;

But due to large table, this self-join is slowing things down. Is it possible to use some sort of window functions here to replace joining? Help is appreciated.

Upvotes: 4

Views: 1696

Answers (3)

Simeon Pilgrim
Simeon Pilgrim

Reputation: 25968

jiggle Filipe's answer the sum(iff(action='create',1,0)) can be swapped for count_if(action='create')

thus become:

with data as (
    select $1 dt, $2 ship, $3 audit, $4 action
    from values('2022-01-02', 1, 'id1', 'destroy')
       ,('2022-01-01', 1, 'id1', 'create')
       ,('2021-12-12', 2, 'id2', 'create')
       ,('2020-12-16', 2, 'id2', 'destroy')
       ,('2020-12-28', 2, 'id3', 'create')
)

select dt
    ,count_if(action='create') over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
    ,count_if(action='destroy') over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data

Upvotes: 1

Adrian White
Adrian White

Reputation: 1804

An alternative answer using PIVOT instead of IFF(). Would be interested to hear which approach scales best for your problem.

enter image description here

Code (Copy|Paste|Run):

with data as (
select $1 dt, $2 ship, $3 audit, $4 action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)

select 
  dt 
, sum($3) over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum($4) over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from 
data pivot ( count (audit) for action in ('create','destroy'));

Upvotes: 3

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

Check this solution with over(order by dt rows between unbounded preceding and 1 preceding):

with data as (
    select $1 dt, $2 ship, $3 audit, $4 action
    from values('2022-01-02', 1, 'id1', 'destroy')
    , ('2022-01-01', 1, 'id1', 'create')
    , ('2021-12-12', 2, 'id2', 'create')
    , ('2020-12-16', 2, 'id2', 'destroy')
    , ('2020-12-28', 2, 'id3', 'create')
)

select dt
    , sum(iff(action='create',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) created_cnt
    , sum(iff(action='destroy',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data

enter image description here

Upvotes: 4

Related Questions