SQLGIT_GeekInTraining
SQLGIT_GeekInTraining

Reputation: 133

PARTITION BY to consider only two specific columns for aggregation?

My table has the following data:

REF_NO PRD_GRP ACC_NO
ABC 12 1234
ABC 9C 1234
DEF AB 7890
DEF TY 9891

I'm trying to build a query that summarises the number of accounts per customer - the product group is irrelevant for this purpose so my expected result is:

REF_NO PRD_GRP ACC_NO NO_OF_ACC
ABC 12 1234 1
ABC 9C 1234 1
DEF AB 7890 2
DEF TY 9891 2

I tried doing this using a window function:

SELECT
    T.REF_NO,
    T.PRD_GRP,
    T.ACC_NO,
    COUNT(T.ACC_NO) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC

FROM TABLE T

However, the NUM_OF_ACC value returned is 2 and not 1 in the above example for the first customer (ABC). It seems that the query is simply counting the number of unique rows for each customer, rather than identifying the number of accounts as desired.

How can I fix this error?

Link to Fiddle - https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=83344cbe95fb46d4a1640caf0bb6d0b2

Upvotes: 1

Views: 2528

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269993

The simplest way to implement count(distinct) as a window functions is by summing two dense_ranks():

SELECT T.REF_NO, T.PRD_GRP, T.ACC_NO,
       (-1 +
         DENSE_RANK() OVER (PARTITION BY t.REF_NO ORDER BY T.ACC_NO ASC) +
         DENSE_RANK() OVER (PARTITION BY t.REF_NO ORDER BY T.ACC_NO DESC)
        ) as cnt_distinct
FROM TABLE T

Upvotes: 0

Charlieface
Charlieface

Reputation: 71668

You need COUNT(DISTINCT, which is unfortunately not supported by SQL Server as a window function.

But you can simulate it with DENSE_RANK and MAX

SELECT
    T.REF_NO,
    T.PRD_GRP,
    T.ACC_NO,
    MAX(T.rn) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC
FROM (
    SELECT *,
        DENSE_RANK() OVER (PARTITION BY T.REF_NO ORDER BY T.ACC_NO) AS rn
    FROM [TABLE] T
) T;

DENSE_RANK will count up rows ordered by ACC_NO, but ignoring ties, therefore the MAX of that will be the number of distinct values.

db<>fiddle.uk

Upvotes: 2

Dale K
Dale K

Reputation: 27250

What you need is COUNT(DISTINCT T.ACC_NO) which is unfortunately not supported in window functions. Therefore you have to write a sub-query to allow you to use COUNT(DISTINCT T.ACC_NO) without a window function.

SELECT
    T.REF_NO,
    T.PRD_GRP,
    T.ACC_NO,
    -- Use of DISTINCT is not allowed with the OVER clause.
    -- COUNT(DISTINCT T.ACC_NO) OVER (PARTITION BY T.REF_NO) AS NUM_OF_ACC,
    (
        SELECT COUNT(DISTINCT T1.ACC_NO)
        FROM TEST_DATA T1
        WHERE T1.REF_NO = T.REF_NO
    ) AS NUM_OF_ACC
FROM TEST_DATA T

Upvotes: 1

Related Questions