stolikp
stolikp

Reputation: 31

Rewriting query to use analytic functions

I have a table with client's id, creation_datetime, company and a binary (0 or 1) column checked. I wrote a very slow query

    select a.id
          ,a.creation_datetime
          ,a.company
          ,count(*) as num_all -- number of all rows
          ,count(distinct b.id) as num_cl -- number of distinct clients
          ,count(case when b.checked = 1 then 1 end) as checked_all -- number of rows checked
          ,count(distinct case when b.checked = 1 then b.id end) as checked_cl -- number of clients checked
    from table a
    join table b on a.company = b.company
                 and b.creation_datetime < a.creation_datetime 
    where add_months(a.creation_datetime, -12) <= b.creation_datetime 
    group by a.id, a.creation_datetime, a.company;

I would like to use analytic functions as they are usually faster than self join on a table (I would like to avoid cartesian product on a large table). My ideas are

    count(*) over (partition by company order by creation_datetime rows between unbounded preceding and 1 preceding) as num_all

    count(case when checked = 1 then 1 end) over (partition by company order by creation_datetime rows between unbounded preceding and 1 preceding) as checked_all

I've two issues here.

  1. count(distinct) doesn't work with order by clause.

  2. I don't know how to impose condition add_months(a.creation_datetime, -12) <= b.creation_datetime. Is it even possible while using analytic functions?

Edit id is not unique. Pairs (id, creation_datetime) are unique. The meaning of (id, creation_datetime) is that client with id applied on creation_datetime. Each application is checked or not. One client can apply many times with many different companies. The goals:

i) for each (id, creation_datetime, company) count the number of applictions (pairs (id, creation_datetime)) having the same company within 12 months,

ii) for each (id, creation_datetime, company) count the number of distinct clients id having the same company within 12 months,

iii) as i) but only checked applications,

iv) as ii) but only checked applications.

Edit 2 Sample data and results http://sqlfiddle.com/#!4/d7156/2 Notice that some rows don't satisfy join conditions and they are excluded from the results. That's fine, I can add rows with zeros later or use left join when the query becomes more efficient.

Edit 3 The rows that do not appear in the results should have zeros. See http://sqlfiddle.com/#!4/d7156/23

Upvotes: 0

Views: 93

Answers (2)

Sayan Malakshinov
Sayan Malakshinov

Reputation: 8655

Looks like you need something like this: http://sqlfiddle.com/#!4/d7156/7

Update: modified as per test data from sqlfiddle.

with pre_range as (
   select
      c.*
     ,to_number(to_char(creation_datetime,'yyyy.mmddhh24miss')) dtnum
   from test_tab c
)
,add_new_clients as (
   select
      p.*
     ,decode(
         count(*)over(partition by company,id order by dtnum range between 1 preceding and 0.00001 preceding)
        ,1,1 -- case when 1 then 1
        ,0,1 -- case when 0 then 1
        ,0   -- else 0
      )  as new_client
   from pre_range p
)
select *
from(
    select a.id,dtnum
           ,a.creation_datetime
           ,a.company
           ,count(*)
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
            as num_all -- number of all rows
           ,sum(new_client) 
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
            as num_cl -- number of distinct clients
           ,count(case when checked = 1 then 1 end) 
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_all -- number of rows checked
           ,sum(case when checked = 1 then new_client else 0 end) 
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_cl-- number of clients checked
    from add_new_clients a
    )
where num_all>0
order by id,creation_datetime;

Full test case with sample data:

with test_tab(id, creation_datetime, company, checked) as (
select 1, to_date('2019-05-13 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 0 from dual union all
select 2, to_date('2019-05-12 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 0 from dual union all
select 3, to_date('2018-05-12 13:26:00', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 0 from dual union all
select 2, to_date('2019-05-11 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 1 from dual union all

select 4, to_date('2019-05-13 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 1 from dual union all
select 1, to_date('2019-05-12 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 1 from dual union all
select 1, to_date('2018-05-12 13:26:00', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 0 from dual union all
select 2, to_date('2019-05-11 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 1 from dual 
)
,pre_range as (
   select
      c.*
     ,to_number(to_char(creation_datetime,'yyyy.mmddhh24miss')) dtnum
   from test_tab c
)
,add_new_clients as (
   select
      p.*
     ,decode(
         count(*)over(partition by company,id order by dtnum range between 1 preceding and 0.00001 preceding)
        ,1,1 -- case when 1 then 1
        ,0,1 -- case when 0 then 1
        ,0   -- else 0
      )  as new_client
   from pre_range p
)
select *
from(
    select a.id,dtnum
           ,a.creation_datetime
           ,a.company
           ,count(*)
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
            as num_all -- number of all rows
           ,sum(new_client) 
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
            as num_cl -- number of distinct clients
           ,count(case when checked = 1 then 1 end) 
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_all -- number of rows checked
           ,sum(case when checked = 1 then new_client else 0 end) 
              over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_cl-- number of clients checked
    from add_new_clients a
    )
where num_all>0
order by id,creation_datetime;

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269753

I would expect a column named id to be unique, so count(distinct) is not necessary:

select a.*, count(*) over (partition by company) as num_all,
       sum(checked) as num_checked_all,
       row_number() over (partition by company order by creation_date) as running_count,
       sum(checked) over (partition by company order by creation_date) as running_checked
from a

Upvotes: 1

Related Questions