SchmitzIT
SchmitzIT

Reputation: 9552

SQL Server: Find most popular category of products bought per user for use in subquery

I have three tables: categories (id, name), products (id, category_id, name)), and purchases (id, user_id, product_id). A product belongs to a category. Users can purchase many products. My intention is to find the most popular category per user.

However, I need to use the result-set of the query as a sub-query, so using any ORDER BY statements unfortunately is off, due to SQL Server limitations (the dreaded The ORDER BY clause is invalid in views, inline functions, derived tables, and subqueries, unless TOP is also specified. error).

My approach has been to create a list of all purchases per user per category. I then have a MAX function to pick out the maximum amount of purchases. I JOIN that result to the original query (duplicated as a sub-query) to retrieve the category_id in question, and finally I grab the category name.

There are two issues with my query:

  1. Obviously I'd prefer not using the same query twice in my code. However, I cannot rely on using CTEs or temp tables, as the result of this query is intended to be linked to a view that has a subset of user-data as it is, and the VIEW code is intended to be used in a third-party package, that can handle only basic SQL code.
  2. In case of a tie (say a useer bought 4 products, 2 from 2 categories each), I end up with a duplicate row for that user.

Fiddle:

http://sqlfiddle.com/#!6/8821b/5

I'd appreciate it if anyone be able to help me figure out a way to ensure only a single row is returned per user, as well as a way to remove the duplicate sub-query.

Thanks!

Upvotes: 4

Views: 2666

Answers (1)

Jim V.
Jim V.

Reputation: 2177

First, thanks for providing the example in SQLFiddle. It makes helping ALOT easier.

You can use row_number for a more precise way of getting the "top" record. In this example I chose to use category_name as a secondary sort criterion after the count.

SELECT user_id, category_name, category_count
FROM
(
  SELECT 
      user_id, COUNT(1) as category_count, category_name, 
      ROW_NUMBER() OVER (
          PARTITION BY user_id 
          ORDER BY COUNT(1) DESC, category_name ASC) 
          as ordinal_position
  FROM
      purchases p 
          JOIN products p2 ON p.product_id = p2.id
          JOIN categories c ON p2.category_id = c.id        
  GROUP BY user_id, category_name
 ) a
WHERE ordinal_position = 1
ORDER BY category_count DESC

Example at SQL Fiddle.

Upvotes: 5

Related Questions