Reputation: 31
I have a table with client's id, creation_datetime, company and a binary (0 or 1) column checked. I wrote a very slow query
select a.id
,a.creation_datetime
,a.company
,count(*) as num_all -- number of all rows
,count(distinct b.id) as num_cl -- number of distinct clients
,count(case when b.checked = 1 then 1 end) as checked_all -- number of rows checked
,count(distinct case when b.checked = 1 then b.id end) as checked_cl -- number of clients checked
from table a
join table b on a.company = b.company
and b.creation_datetime < a.creation_datetime
where add_months(a.creation_datetime, -12) <= b.creation_datetime
group by a.id, a.creation_datetime, a.company;
I would like to use analytic functions as they are usually faster than self join on a table (I would like to avoid cartesian product on a large table). My ideas are
count(*) over (partition by company order by creation_datetime rows between unbounded preceding and 1 preceding) as num_all
count(case when checked = 1 then 1 end) over (partition by company order by creation_datetime rows between unbounded preceding and 1 preceding) as checked_all
I've two issues here.
count(distinct) doesn't work with order by clause.
I don't know how to impose condition add_months(a.creation_datetime, -12) <= b.creation_datetime. Is it even possible while using analytic functions?
Edit id is not unique. Pairs (id, creation_datetime) are unique. The meaning of (id, creation_datetime) is that client with id applied on creation_datetime. Each application is checked or not. One client can apply many times with many different companies. The goals:
i) for each (id, creation_datetime, company) count the number of applictions (pairs (id, creation_datetime)) having the same company within 12 months,
ii) for each (id, creation_datetime, company) count the number of distinct clients id having the same company within 12 months,
iii) as i) but only checked applications,
iv) as ii) but only checked applications.
Edit 2 Sample data and results http://sqlfiddle.com/#!4/d7156/2 Notice that some rows don't satisfy join conditions and they are excluded from the results. That's fine, I can add rows with zeros later or use left join when the query becomes more efficient.
Edit 3 The rows that do not appear in the results should have zeros. See http://sqlfiddle.com/#!4/d7156/23
Upvotes: 0
Views: 93
Reputation: 8655
Looks like you need something like this: http://sqlfiddle.com/#!4/d7156/7
Update: modified as per test data from sqlfiddle.
with pre_range as (
select
c.*
,to_number(to_char(creation_datetime,'yyyy.mmddhh24miss')) dtnum
from test_tab c
)
,add_new_clients as (
select
p.*
,decode(
count(*)over(partition by company,id order by dtnum range between 1 preceding and 0.00001 preceding)
,1,1 -- case when 1 then 1
,0,1 -- case when 0 then 1
,0 -- else 0
) as new_client
from pre_range p
)
select *
from(
select a.id,dtnum
,a.creation_datetime
,a.company
,count(*)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
as num_all -- number of all rows
,sum(new_client)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
as num_cl -- number of distinct clients
,count(case when checked = 1 then 1 end)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_all -- number of rows checked
,sum(case when checked = 1 then new_client else 0 end)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_cl-- number of clients checked
from add_new_clients a
)
where num_all>0
order by id,creation_datetime;
Full test case with sample data:
with test_tab(id, creation_datetime, company, checked) as (
select 1, to_date('2019-05-13 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 0 from dual union all
select 2, to_date('2019-05-12 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 0 from dual union all
select 3, to_date('2018-05-12 13:26:00', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 0 from dual union all
select 2, to_date('2019-05-11 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_a', 1 from dual union all
select 4, to_date('2019-05-13 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 1 from dual union all
select 1, to_date('2019-05-12 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 1 from dual union all
select 1, to_date('2018-05-12 13:26:00', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 0 from dual union all
select 2, to_date('2019-05-11 13:27:28', 'YYYY-MM-DD HH24:MI:SS'), 'company_b', 1 from dual
)
,pre_range as (
select
c.*
,to_number(to_char(creation_datetime,'yyyy.mmddhh24miss')) dtnum
from test_tab c
)
,add_new_clients as (
select
p.*
,decode(
count(*)over(partition by company,id order by dtnum range between 1 preceding and 0.00001 preceding)
,1,1 -- case when 1 then 1
,0,1 -- case when 0 then 1
,0 -- else 0
) as new_client
from pre_range p
)
select *
from(
select a.id,dtnum
,a.creation_datetime
,a.company
,count(*)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
as num_all -- number of all rows
,sum(new_client)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding)
as num_cl -- number of distinct clients
,count(case when checked = 1 then 1 end)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_all -- number of rows checked
,sum(case when checked = 1 then new_client else 0 end)
over(partition by company order by dtnum range between 1 preceding and 0.00001 preceding) as checked_cl-- number of clients checked
from add_new_clients a
)
where num_all>0
order by id,creation_datetime;
Upvotes: 1
Reputation: 1269753
I would expect a column named id
to be unique, so count(distinct)
is not necessary:
select a.*, count(*) over (partition by company) as num_all,
sum(checked) as num_checked_all,
row_number() over (partition by company order by creation_date) as running_count,
sum(checked) over (partition by company order by creation_date) as running_checked
from a
Upvotes: 1