Alex
Alex

Reputation: 53

Sort results by most matches in IN() clause

I am looking for the best possible way to re-write a query that uses the IN() clause. I'd like to sort the results by products that have the most total matches in products_adjectives and products_interests.

I have 5 tables.

- products
    title, price, description, etc.
- interests
    interest_id, interest_name
- adjectives
    adjective_id, adjective_name
- products_interests
    product_id, interest_id
- products_adjectives
    product_id, adjective_id

I'm currently using this query to return any product that matches any adjective or interest. Here's an example:

SELECT *
  FROM products
    LEFT JOIN products_adjectives
      ON products_adjectives.product_id = products.id
    LEFT JOIN products_interests
      ON products_interests.product_id = products.id
  WHERE products_adjectives.adjective_id IN (3,6,9,12,13) OR products_interests.interest_id IN (2,5,8,12,16,18)
  GROUP BY products.id ORDER BY ABS(products.price)

Currently, any product that exists in any of the adjective_ids or interest_ids is returned. That query will return any products that have any adjective_id 3,6,9,12,13 or any interest_id 2,5,8,12,16,18 I'd like to continue returning all results, but instead I'd like them sorted by which products contain the most matches of adjectives or interests. So, if a product has adjectives 3,6,9,10,11 I'd like that ahead of a product that has adjective 3,4,5,7,8

Any thoughts or direction would be appreciated. If I'm going about this entirely wrong (with the IN() clause) please let me know!

Thanks!

Upvotes: 2

Views: 93

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269703

You have a problem with your query because you are generating a cartesian product between the interests and adjectives for a given product.

The simplest way to fix this is to count the distinct matches. So, the following order by clause should do what you want:

ORDER BY (count(distinct interest_id) + count(distinct adjective_id)) desc,
         abs(product.price)

You could fix this problem in other ways, such as counting the interests in a subquery and counting the adjectives in a subquery. That would be the recommended approach, particularly if the lists were to become large. (Say, 30 matching interests and 30 matching adjectives would generate an intermediate table with almost one thousand rows for the product.)

Upvotes: 0

Andy Jones
Andy Jones

Reputation: 6275

I'm thinking something like the below. The key idea you were missing is ordering by count(*) DESC The downside of this query is that no products will be pulled if there are no matching records in the interests or adjectives tables. However, since I think you are pulling things based on their relevance, you should not want products with zero relevance.

SELECT *, count(*) as relevance 
FROM products, products_adjectives, products_interests
WHERE products.id = products_interests.product_id 
  AND products.id = products_adjectives.product_id 
  AND (interest_id IN (14, 22, 78) OR adjective_id IN (8, 17, 26))
GROUP BY products.id
ORDER BY count(*) DESC, price ASC

See fiddle: http://sqlfiddle.com/#!3/838571/2

Upvotes: 1

Related Questions