Reputation:

Count cummulative distinct

I have the below table. Is it possible to do a cummulative distinct count? For example, if A1 has 3 distinct values, then the count for it will be 3. Afterwards, check for A1 and A2. If A1 and A2 together have 5 distinct values, 5. Repeat until A1 + A2 ... + An and count the distinct values.

A	V
A1	V1
A1	V2
A1	V2
A2	V1
A2	V2
A2	V3

My expected output would be:

A	C
A1	2
A2	3

Upvotes: 0

Answers (3)

MT0

Reputation: 167981

You can use a partitioned outer join to ensure that all V values are counted for all A values and then use the FIRST_VALUE analytic function to find whether a value exists in the current or preceding A values for the V:

SELECT a,
       COUNT( DISTINCT fv ) AS c
FROM   (
  SELECT t.a,
         FIRST_VALUE(t.v) IGNORE NULLS OVER (
           PARTITION BY v.v
           ORDER BY t.a
           ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
         ) AS fv
  FROM   ( SELECT DISTINCT v FROM table_name ) v
         LEFT OUTER JOIN table_name t
         PARTITION BY ( t.a )
         ON ( t.v = v.v )
)
GROUP BY a
ORDER BY a

Which, for the sample data:

CREATE TABLE table_name ( A, V ) AS
SELECT 'A1', 'V1' FROM DUAL UNION ALL
SELECT 'A1', 'V2' FROM DUAL UNION ALL
SELECT 'A1', 'V3' FROM DUAL UNION ALL
SELECT 'A2', 'V1' FROM DUAL UNION ALL
SELECT 'A2', 'V3' FROM DUAL UNION ALL
SELECT 'A2', 'V4' FROM DUAL UNION ALL
SELECT 'A3', 'V2' FROM DUAL UNION ALL
SELECT 'A3', 'V3' FROM DUAL UNION ALL
SELECT 'A4', 'V1' FROM DUAL UNION ALL
SELECT 'A4', 'V5' FROM DUAL;

Outputs:

A C

A1 3

A2 4

A3 4

A4 5

A	C
A1	3
A2	4
A3	4
A4	5

db<>fiddle here

Upvotes: 0

forpas

Reputation: 164099

You can use ROW_NUMBER() window function to find the 1st occurrence of each V and then COUNT() window function to count only these 1st occurrences:

SELECT DISTINCT A, 
       COUNT(CASE WHEN rn = 1 THEN 1 END) OVER (ORDER BY A) C
FROM (
  SELECT A, ROW_NUMBER() OVER (PARTITION BY V ORDER BY A) rn
  FROM tablename 
) t
ORDER BY A

See the demo.

Upvotes: 0

Gordon Linoff

Reputation: 1269873

This answers the original version of the question.

You can aggregate twice . . . once to keep the first occurrence of v and the second to aggregate again:

select a, count(*) as new_cs
from (select v, min(a) as a
      from t
      group by v
     ) v
group by a;

Note: The above only shows as that have new values. If you want all a, then window functions are a better approach:

select a, sum(case when seqnum = 1 then 1 else 0 end) as c
from (select t.*, row_number() over (partition by v order by a) as seqnum
      from t
     ) t
group by a
order by a;

Here is a db<>fiddle.

Upvotes: 2

Count cummulative distinct

Answers (3)

Related Questions