Reputation: 9552
I have three tables: categories
(id, name), products
(id, category_id, name)), and purchases
(id, user_id, product_id). A product
belongs to a category
. Users can purchase many products
. My intention is to find the most popular category
per user.
However, I need to use the result-set of the query as a sub-query, so using any ORDER BY
statements unfortunately is off, due to SQL Server limitations (the dreaded The ORDER BY clause is invalid in views, inline functions, derived tables, and subqueries, unless TOP is also specified.
error).
My approach has been to create a list of all purchases
per user per category
. I then have a MAX
function to pick out the maximum amount of purchases
. I JOIN that result to the original query (duplicated as a sub-query) to retrieve the category_id
in question, and finally I grab the category name.
There are two issues with my query:
products
, 2 from 2 categories
each), I end up with a duplicate row for that user. Fiddle:
http://sqlfiddle.com/#!6/8821b/5
I'd appreciate it if anyone be able to help me figure out a way to ensure only a single row is returned per user, as well as a way to remove the duplicate sub-query.
Thanks!
Upvotes: 4
Views: 2666
Reputation: 2177
First, thanks for providing the example in SQLFiddle. It makes helping ALOT easier.
You can use row_number for a more precise way of getting the "top" record. In this example I chose to use category_name as a secondary sort criterion after the count.
SELECT user_id, category_name, category_count
FROM
(
SELECT
user_id, COUNT(1) as category_count, category_name,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY COUNT(1) DESC, category_name ASC)
as ordinal_position
FROM
purchases p
JOIN products p2 ON p.product_id = p2.id
JOIN categories c ON p2.category_id = c.id
GROUP BY user_id, category_name
) a
WHERE ordinal_position = 1
ORDER BY category_count DESC
Upvotes: 5