Reputation: 11
I have a table where I'm trying to count the distinct number of members per group. I know there's duplicates based on the count(distinct *) function. But when I try to group them into the group and count distinct, it's not spitting out the number I'd expect.
select count(distinct memberid), count(*)
from dbo.condition c
output:
count | count |
---|---|
303,781 | 348,722 |
select groupid, count(*), count(distinct memberid)
from dbo.condition c
group by groupid
output:
groupid | count | count |
---|---|---|
2 | 19,984 | 19,984 |
3 | 25,689 | 25,689 |
5 | 14,400 | 14,400 |
24 | 56,058 | 56,058 |
25 | 200,106 | 200,106 |
29 | 27,847 | 27,847 |
30 | 1,370 | 1,370 |
31 | 3,268 | 3,268 |
The numbers in the second query equate when they shouldn't be. Does anyone know what I'm doing wrong? I need the 3rd column to be equal to 303,781 not 348,722.
Thanks!
Upvotes: 0
Views: 803
Reputation: 15482
There's nothing wrong with your second query. Since you're aggregating on the "groupid" field, the output you get tells you that there are no duplicates within the same groupid of the "memberid" values (basically counting values equates to counting distinctively).
On the other hand, in the first query the aggregation happens without any partitioning, whose output hints there are duplicate values across different "groupid" values.
Took the liberty of adding of an example that corroborates your answer:
create table aa (groupid int not null, memberid int not null );
insert into aa (groupid, memberid)
values
(1, 1), (1, 2), (1, 3), (2, 1), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (4, 5), (5, 3)
select groupid, count(*), count(distinct memberid)
from aa group by groupid;
select count(*), count(distinct memberid)
from aa
Upvotes: 2