Reputation: 415
I want to count the distinct
day_number
over the past 30 days. However, the distinct function can't be used with over
If I delete distinct
, it will give me the total count of the day_number
, but day_number
can have lots of duplicate. So that's why I want to add distinct
.
select tr.*,
count( distinct day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as result
from table tr;
Can anyone please show me how to count the distinct numbers in the over(partition by..)
statement? Thanks in advance.
Upvotes: 0
Views: 18953
Reputation: 71
You can simulate this by creating an column that will be 1 when a "new break" appears in the ordered list and null otherwise. Then you just have to count or sum these break indicators. And both count() and sum() support "over(order by...)". On your example, it will be:
with TR as
(select 1 as PK, 'ABCDE' as ACCOUNT, 23 as DAY_number from dual
union all select 2, 'ABCDE', 23 from dual
union all select 3, 'ABCDE', 24 from dual
union all select 4, 'ABCDE', 25 from dual
)
select tr.*
, count( /*distinct*/ day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as wrong_result
, count(IS_NEW_BREAK) over(PARTITION BY ACCOUNT order by day_number range 29 PRECEDING) as desired_output
from
(select tr.*
, case min(case when day_number is not null then PK end) over(PARTITION BY ACCOUNT, day_number) when PK then 1 end as IS_NEW_BREAK
from tr
) tr;
Upvotes: 0
Reputation: 43
The count(distinct ...)
works fine with the over clause, the main problem it's the order by. You can't do count (distinct ..) over (partition by ... order by ...)
because DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY. So i did this:
select tr.*, count (distinct day_number) over (partition by account)
from (select t.*, row_number() over (partition by account order by day_number) row_number from table t) tr
where row_number < 30;
I've tested it in employees scheme of HR (Free oracle scheme that you can find everywhere) I'm not sure it will works in your schema because i have not a replica of it, but, if not, it should give you some ideas:
select count (distinct manager_id) over (partition by department_id), department_id, manager_id
from (select e.*, row_number() over (partition by department_id order by employee_id) row_number from employees e)
where row_number < 30;
Upvotes: 3
Reputation: 23578
You can do this by first creating a column that only lists each id once, and then doing a range-count on that column, e.g.:
WITH sd AS (SELECT 1 ID, 10 val FROM dual UNION ALL
SELECT 1 ID, 20 val FROM dual UNION ALL
SELECT 2 ID, 30 val FROM dual UNION ALL
SELECT 2 ID, 40 val FROM dual UNION ALL
SELECT 4 ID, 50 val FROM dual UNION ALL
SELECT 4 ID, 60 val FROM dual UNION ALL
SELECT 6 ID, 70 val FROM dual)
SELECT ID,
val,
COUNT(id_distinct) OVER (ORDER BY ID RANGE 3 PRECEDING) cnt_disinct_ids
FROM (SELECT ID,
val,
CASE WHEN row_number() OVER (PARTITION BY ID ORDER BY val) = 1 THEN ID END id_distinct
FROM sd);
ID VAL CNT_DISINCT_IDS
---------- ---------- ---------------
1 10 1
1 20 1
2 30 2
2 40 2
4 50 3
4 60 3
6 70 2
ETA: proof that the above technique works for your data:
WITH your_table AS (SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 24 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 25 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
SELECT 'ABCDE' account_sk, 55 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 10 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 12 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual UNION ALL
SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual)
SELECT account_sk,
day_sk,
COUNT(day_sk_distinct) OVER (PARTITION BY account_sk ORDER BY day_sk RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) count_distinct_day_sk
FROM (SELECT account_sk,
day_sk,
CASE WHEN row_number() OVER (PARTITION BY account_sk, day_sk ORDER BY day_sk) = 1 THEN day_sk END day_sk_distinct
FROM your_table);
ACCOUNT_SK DAY_SK COUNT_DISTINCT_DAY_SK
---------- ---------- ---------------------
ABCDE 23 1
ABCDE 23 1
ABCDE 24 2
ABCDE 25 3
ABCDE 53 3
ABCDE 53 3
ABCDE 55 2
VWXYZ 10 1
VWXYZ 12 2
VWXYZ 40 2
VWXYZ 40 2
Upvotes: 1