Reputation: 80
I am trying to do a distinct count of names partitioned over their roles. So, in the example below: I have a table with the names and the person's role.
I would like a role count column that gives the total number of distinct people in that role. For example, the role manager comes up four times but there are only 3 distinct people for that role - Sam comes up again on a different date.
If I remove the date column, it works fine using:
select
a.date,
a.Name,
a.Role,
count(a.Role) over (partition by a.Role) as Role_Count
from table a
group by a.date, a.name, a.role
Including the date column then makes it count the total roles rather than by distinct name (which I know I haven't identified in the partition). Giving 4 managers and 3 analysts.
How do I fix this?
Desired output:
Date | Name | Role | Role_Count |
---|---|---|---|
01/01 | Sam | Manager | 3 |
02/01 | Sam | Manager | 3 |
01/01 | John | Manager | 3 |
01/01 | Dan | Manager | 3 |
01/01 | Bob | Analyst | 2 |
02/01 | Bob | Analyst | 2 |
01/01 | Mike | Analyst | 2 |
Current output:
Date | Name | Role | Role_Count |
---|---|---|---|
01/01 | Sam | Manager | 4 |
02/01 | Sam | Manager | 4 |
01/01 | John | Manager | 4 |
01/01 | Dan | Manager | 4 |
01/01 | Bob | Analyst | 3 |
02/01 | Bob | Analyst | 3 |
01/01 | Mike | Analyst | 3 |
Upvotes: 2
Views: 7581
Reputation: 71144
Unfortunately, COUNT(DISTINCT
is not available as a window aggregate. But we can use a combination of DENSE_RANK
and MAX
to simulate it:
select
a.Name,
a.Role,
MAX(rnk) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role ORDER BY Name) AS rnk
FROM table
) a
If Name
may have nulls then we need to take that into account:
select
a.Name,
a.Role,
MAX(CASE WHEN Name IS NOT NULL THEN rnk END) OVER (PARTITION BY date, Role) as Role_Count
from (
SELECT *,
DENSE_RANK() OVER (PARTITION BY date, Role, CASE WHEN Name IS NULL THEN 0 ELSE 1 END ORDER BY Name) AS rnk
FROM table
) a
Upvotes: 2
Reputation: 1269445
Unfortunately, SQL Server (and other databases as well) don't support COUNT(DISTINCT)
as a window function. Fortunately, there is a simple trick to work around this -- the sum of DENSE_RANK()
s minus one:
select a.Name, a.Role,
(dense_rank() over (partition by a.Role order by a.Name asc) +
dense_rank() over (partition by a.Role order by a.Name desc) -
1
) as distinct_names_in_role
from table a
group by a.name, a.role
Upvotes: 4