Reputation: 2168
I'm working on a project, where we need to count the number of distinct rows. A simplified version of the scenario includes a user
table, a keyword
table and a keyword_user
table.
The user
table just includes common user meta data, like name etc. The other tables are listed below.
keyword_user:
id
user_id
keyword_id
keyword:
id,
description
What I want to do, is to find a max number of users (5), based on the users keyword_id's, while also counting the total number of matching rows. The count must be distinct.
The query:
SELECT TOP 5 u.[id],
u.[firstname],
u.[lastname],
total = Count(*) OVER()
FROM [user] u
INNER JOIN [keyword_user] ku
ON u.[id] = ku.[user_id]
WHERE ( ku.keyword_id IN ( '5f6501ec-0a71-4067-a21d-3c5f87a76411', 'c19b95c0-8554-4bbd-9526-db8f1c4f1edf'))
AND u.id NOT IN ( '12db3001-b3b9-4626-8a02-2519102cb53a' )
Resultset:
+--------------------------------------+-----------+----------+-------+
| id | firstname | lastname | total |
+--------------------------------------+-----------+----------+-------+
| F0527AC3-747A-45A6-9CF9-B1F6C7F548F8 | Kasper | Thomsen | 3 |
| 95988F6D-9C91-4779-B6C3-3D4B4D6AE836 | Michael | Jacobsen | 3 |
| 95988F6D-9C91-4779-B6C3-3D4B4D6AE836 | Michael | Jacobsen | 3 |
+--------------------------------------+-----------+----------+-------+
PROBLEM:
The problem being here, that Michael is counted twice, and the total count therefore is 3, when i want it to be 2. When using count() over()
you cannot parse an expression into it, that contains distinct. Also, if I just SELECT DISTINCT
, my resultset looks fine, besides the total count, which would still be 3.
If I need to include more information to support the question, please let me know, and I will try to answer the best I can.
MSSQL CREATE DB SCRIPT (SAMPLE DATA)
wanted resultset:
+--------------------------------------+-----------+----------+-------+
| id | firstname | lastname | total |
+--------------------------------------+-----------+----------+-------+
| F0527AC3-747A-45A6-9CF9-B1F6C7F548F8 | Kasper | Thomsen | 2 |
| 95988F6D-9C91-4779-B6C3-3D4B4D6AE836 | Michael | Jacobsen | 2 |
+--------------------------------------+-----------+----------+-------+
Upvotes: 10
Views: 1566
Reputation: 894
i'm a little confused in your case specially with the "keywords" and how they're related per user (this is just a process issue for me) thus found myself doing an outer group by containing your initial query as my source table.
Please comment below so we can improve this.
SELECT
id
, firstname
, lastname
, total
, COUNT(*) AS [per_user_count]
FROM (
SELECT TOP 5 u.[id],
u.[firstname],
u.[lastname],
total = Count(*) OVER()
FROM [user] u
INNER JOIN [keyword_user] ku
ON u.[id] = ku.[user_id]
WHERE
(
ku.keyword_id IN (
'5f6501ec-0a71-4067-a21d-3c5f87a76411'
, 'c19b95c0-8554-4bbd-9526-db8f1c4f1edf'
)
)
AND u.id NOT IN ('12db3001-b3b9-4626-8a02-2519102cb53a')
) AS T
GROUP BY
T.id
, T.firstname
, T.lastname
, T.total
EDIT : We really had a confusion there, then so I created a simpler script that shall exclude the keywords, just the unique users (to generate the total overall) and get the top 5 of it (random order).
SELECT
TOP 5
T.id
, T.firstname
, T.lastname
, Total = COUNT(*) OVER()
FROM (
SELECT DISTINCT
u.*
FROM [keyword_user] ku
LEFT JOIN [user] u
ON
ku.user_id = u.id
WHERE
(
ku.keyword_id IN (
'5f6501ec-0a71-4067-a21d-3c5f87a76411'
, 'c19b95c0-8554-4bbd-9526-db8f1c4f1edf')
)
AND ku.[user_id] NOT IN (
'12db3001-b3b9-4626-8a02-2519102cb53a'
)
) AS T
Thanks
EDIT : Your scenario is a straight forward "search of keywords" that is linked to an entity, with an overall count and a top 5 result. As how I understand the CTE (and based in MSDN), CTE is a great solution for hierarchical data mining (no need to do while and do whatever back-flip to get your organizational hierarchy) which doesn't really fit the scenario that we have here.
Upvotes: 1
Reputation: 32693
You really should explain what you need in the question, not in comments.
In CTE_Users
we find all distinct users for the given keywords.
Then join the result with user
to get user details. At least it produces the result that you expect with the given small sample data.
WITH
CTE_Users
AS
(
SELECT DISTINCT ku.user_id
FROM
keyword_user AS ku
WHERE
ku.keyword_id IN (
'5f6501ec-0a71-4067-a21d-3c5f87a76411',
'c19b95c0-8554-4bbd-9526-db8f1c4f1edf')
AND ku.user_id NOT IN (
'12db3001-b3b9-4626-8a02-2519102cb53a')
)
SELECT TOP(5)
u.id
,u.firstname
,u.lastname
,COUNT(*) OVER() AS total
FROM
user AS u
INNER JOIN CTE_Users ON CTE_Users.user_id = u.id
;
Upvotes: 1
Reputation: 1464
You can try:
SELECT TOP 5 * FROM (
SELECT
u.[id],
u.[firstname],
u.[lastname],
total = Count(*) OVER(PARTITION BY ku.keyword_id),
rownum = ROW_NUMBER() OVER(PARTITION BY ku.keyword_id ORDER BY u.ID)
FROM [user] u
INNER JOIN [keyword_user] ku
ON u.[id] = ku.[user_id]
WHERE (ku.keyword_id IN ( '5f6501ec-0a71-4067-a21d-3c5f87a76411', 'c19b95c0-8554-4bbd-9526-db8f1c4f1edf'))
AND u.id NOT IN ( '12db3001-b3b9-4626-8a02-2519102cb53a' )
) AS A ORDER BY A.rownum DESC
Upvotes: 0
Reputation: 15389
Try this:
I've create two temporary tables (#user
and #user_key
) and populate them.
create table #user (id int, name varchar(20))
create table #user_key (id int, fk_user int, content varchar(50))
insert into #user values
(1, 'Giuseppe'),
(2, 'Anna'),
(3, 'Angela'),
(4, 'Maria'),
(5, 'Ethra'),
(6, 'Piero')
insert into #user_key values
(1, 1, 'ciao'),
(2, 1, 'hello'),
(3, 2, 'hallo'),
(4, 4, 'hullo')
The extract query:
I use as main table #user
, so I add a subquery about total count but in the order by clause I try to order users about their keywords. You can add other conditions (as your In / NOT IN)
select top 5 id, name, (select COUNT(*) from #user_key uk)
from #user u
order by (select COUNT(*) from #user_key uk where uk.fk_user = u.id) desc
Go to SqlFiddle
EDIT
You want this?:
97D476C2-B52C-4D44-A460-44472CBF8817 Michael testing 2
F4FE5550-BC69-437E-91A0-5B11E0D9279E Kasper Test 2
or this?
97D476C2-B52C-4D44-A460-44472CBF8817 Michael testing 2
F4FE5550-BC69-437E-91A0-5B11E0D9279E Kasper Test 2
12DB3001-B3B9-4626-8A02-2519102CB53A Thomas Teil 2
Upvotes: 0