vicky
vicky

Reputation: 415

count(distinct) over (partition by... doesn't work in Oracle SQL

enter image description here

I want to count the distinct day_number over the past 30 days. However, the distinct function can't be used with over

If I delete distinct, it will give me the total count of the day_number, but day_number can have lots of duplicate. So that's why I want to add distinct.

select tr.*,
       count( distinct day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as result
from table tr;

Can anyone please show me how to count the distinct numbers in the over(partition by..) statement? Thanks in advance.

Upvotes: 0

Views: 18953

Answers (3)

Olivier Comte
Olivier Comte

Reputation: 71

You can simulate this by creating an column that will be 1 when a "new break" appears in the ordered list and null otherwise. Then you just have to count or sum these break indicators. And both count() and sum() support "over(order by...)". On your example, it will be:

with TR as
(select 1 as PK, 'ABCDE' as ACCOUNT, 23 as DAY_number from dual
union all select 2, 'ABCDE', 23 from dual
union all select 3, 'ABCDE', 24 from dual
union all select 4, 'ABCDE', 25 from dual
)
select tr.*
    ,  count( /*distinct*/ day_number) OVER (PARTITION BY ACCOUNT ORDER BY DAY_number range 29 PRECEDING) as wrong_result
    ,  count(IS_NEW_BREAK) over(PARTITION BY ACCOUNT order by day_number range 29 PRECEDING) as desired_output
    from
    (select tr.*
        ,   case min(case when day_number is not null then PK end) over(PARTITION BY ACCOUNT, day_number) when PK then 1 end as IS_NEW_BREAK
        from tr
    ) tr;

Upvotes: 0

GoodGameSensei
GoodGameSensei

Reputation: 43

The count(distinct ...) works fine with the over clause, the main problem it's the order by. You can't do count (distinct ..) over (partition by ... order by ...) because DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY. So i did this:

select tr.*, count (distinct day_number) over (partition by account)
from (select t.*, row_number() over (partition by account order by day_number) row_number from table t) tr
where row_number < 30;

I've tested it in employees scheme of HR (Free oracle scheme that you can find everywhere) I'm not sure it will works in your schema because i have not a replica of it, but, if not, it should give you some ideas:

 select count (distinct manager_id) over (partition by department_id), department_id, manager_id
from (select e.*, row_number() over (partition by department_id order by employee_id) row_number from employees e)
where row_number < 30;

Upvotes: 3

Boneist
Boneist

Reputation: 23578

You can do this by first creating a column that only lists each id once, and then doing a range-count on that column, e.g.:

WITH sd AS (SELECT 1 ID, 10 val FROM dual UNION ALL
            SELECT 1 ID, 20 val FROM dual UNION ALL
            SELECT 2 ID, 30 val FROM dual UNION ALL
            SELECT 2 ID, 40 val FROM dual UNION ALL
            SELECT 4 ID, 50 val FROM dual UNION ALL
            SELECT 4 ID, 60 val FROM dual UNION ALL
            SELECT 6 ID, 70 val FROM dual)
SELECT ID,
       val,
       COUNT(id_distinct) OVER (ORDER BY ID RANGE 3 PRECEDING) cnt_disinct_ids
FROM   (SELECT ID,
               val,
               CASE WHEN row_number() OVER (PARTITION BY ID ORDER BY val) = 1 THEN ID END id_distinct
        FROM   sd);

        ID        VAL CNT_DISINCT_IDS
---------- ---------- ---------------
         1         10               1
         1         20               1
         2         30               2
         2         40               2
         4         50               3
         4         60               3
         6         70               2

ETA: proof that the above technique works for your data:

WITH your_table AS (SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 23 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 24 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 25 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 53 day_sk FROM dual UNION ALL
                    SELECT 'ABCDE' account_sk, 55 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 10 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 12 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual UNION ALL
                    SELECT 'VWXYZ' account_sk, 40 day_sk FROM dual)
SELECT account_sk,
       day_sk,
       COUNT(day_sk_distinct) OVER (PARTITION BY account_sk ORDER BY day_sk RANGE BETWEEN 29 PRECEDING AND CURRENT ROW) count_distinct_day_sk
FROM   (SELECT account_sk,
               day_sk,
               CASE WHEN row_number() OVER (PARTITION BY account_sk, day_sk ORDER BY day_sk) = 1 THEN day_sk END day_sk_distinct
        FROM   your_table);

ACCOUNT_SK     DAY_SK COUNT_DISTINCT_DAY_SK
---------- ---------- ---------------------
ABCDE              23                     1
ABCDE              23                     1
ABCDE              24                     2
ABCDE              25                     3
ABCDE              53                     3
ABCDE              53                     3
ABCDE              55                     2
VWXYZ              10                     1
VWXYZ              12                     2
VWXYZ              40                     2
VWXYZ              40                     2

Upvotes: 1

Related Questions