Kirzilla
Kirzilla

Reputation: 16596

SQL : find rows and sort according to number of matching columns?

Let's imagine that we are having "cars" table with such a simple structure...

car_id INT
color ENUM('black','white','blue')
weight ENUM('light','medium','heavy')
type ENUM('van','sedan','limo')

Fist, I'm selecting car (1, black, heavy, limo), then I'd like to get list of related cars sorted by number of matching columns (without any column weight). So, first I'm expecting to see (black, heavy, limo) cars, then I'm expecting to see cars with only 2 matching fields etc.

Is it possible to execute this kind of sorting using SQL?

Sorry for my English, but I really hope that my question is clear for you.

Thank you.

Upvotes: 5

Views: 4259

Answers (4)

Brent Foxwell
Brent Foxwell

Reputation: 91

I know this is an old question, but you should be able to wrap an expression in parenthesis to evaluate it

SELECT   *           
FROM     `cars`
WHERE    `color` = "black"
   OR    `weight` = "heavy"
   OR    `type` = "limo"
ORDER BY (   (`color` = "black")
           + (`weight` = "heavy")
           + (`type` = "limo") 
         ) DESC

Each expression inside parenthesis will equal 1 if true, 0 if false; thus the sum of which will be the number of matches.

Upvotes: 7

Ian
Ian

Reputation: 1

mysql> select * from cars;
+--------+-------+--------+-------+
| car_id | color | weight | type  |
+--------+-------+--------+-------+
|      1 | black | light  | van   |
|      2 | black | light  | sedan |
|      3 | black | light  | limo  |
|      4 | black | medium | van   |
|      5 | black | medium | sedan |
|      6 | black | medium | limo  |
|      7 | black | heavy  | van   |
|      8 | black | heavy  | sedan |
|      9 | black | heavy  | limo  |
|     10 | white | light  | van   |
|     11 | white | light  | sedan |
|     12 | white | light  | limo  |
|     13 | white | medium | van   |
|     14 | white | medium | sedan |
|     15 | white | medium | limo  |
|     16 | white | heavy  | van   |
|     17 | white | heavy  | sedan |
|     18 | white | heavy  | limo  |
|     19 | blue  | light  | van   |
|     20 | blue  | light  | sedan |
|     21 | blue  | light  | limo  |
|     22 | blue  | medium | van   |
|     23 | blue  | medium | sedan |
|     24 | blue  | medium | limo  |
|     25 | blue  | heavy  | van   |
|     26 | blue  | heavy  | sedan |
|     27 | blue  | heavy  | limo  |
+--------+-------+--------+-------+
27 rows in set (0.00 sec)

select *,
(case
  when color = 'black' and weight = 'heavy' and type = 'limo'
    then 3
  when ( color = 'black' and type = 'limo') or 
      (color = 'black' and weight = 'heavy') or 
       (weight = 'heavy' and type = 'limo')
    then 2
  else 1
end) sort_order
from cars
where color = 'black' or weight = 'heavy' or type = 'limo'
order by sort_order desc;


+--------+-------+--------+-------+------------+
| car_id | color | weight | type  | sort_order |
+--------+-------+--------+-------+------------+
|      9 | black | heavy  | limo  |          3 |
|     27 | blue  | heavy  | limo  |          2 |
|     18 | white | heavy  | limo  |          2 |
|      8 | black | heavy  | sedan |          2 |
|      7 | black | heavy  | van   |          2 |
|      6 | black | medium | limo  |          2 |
|      3 | black | light  | limo  |          2 |
|     24 | blue  | medium | limo  |          1 |
|     25 | blue  | heavy  | van   |          1 |
|     21 | blue  | light  | limo  |          1 |
|     26 | blue  | heavy  | sedan |          1 |
|     17 | white | heavy  | sedan |          1 |
|     16 | white | heavy  | van   |          1 |
|     15 | white | medium | limo  |          1 |
|     12 | white | light  | limo  |          1 |
|      5 | black | medium | sedan |          1 |
|      4 | black | medium | van   |          1 |
|      2 | black | light  | sedan |          1 |
|      1 | black | light  | van   |          1 |
+--------+-------+--------+-------+------------+
19 rows in set (0.00 sec)

Upvotes: 0

Joe Lloyd
Joe Lloyd

Reputation: 574

There are probably a few ways to optimise the sub-queries, but without using case statements or sub-optimal join clauses:

select
        *
    from
        (
            select
                    selection.CarId,
                    selection.Colour,
                    selection.Weight,
                    selection.Type,
                    3 as Relevance
                from
                    tblCars as selection
                where
                    selection.Colour = 'black' and selection.Weight = 'light' and selection.Type = 'van'
            union all
            select
                    cars.CarId,
                    cars.Colour,
                    cars.Weight,
                    cars.Type,
                    count(*) as Relevance
                from
                    tblCars as cars
                inner join
                    (
                        select
                                byColour.CarId
                            from
                                tblCars as cars
                            inner join
                                tblCars as byColour
                            on
                                cars.Colour = byColour.Colour
                            where
                                cars.Colour = 'black' and cars.Weight = 'light' and cars.Type = 'van'
                                and
                                byColour.CarId <> cars.CarId
                        union all
                        select
                                byWeight.CarId
                            from
                                tblCars as cars
                            inner join
                                tblCars as byWeight
                            on
                                cars.Weight = byWeight.Weight
                            where
                                cars.Colour = 'black' and cars.Weight = 'light' and cars.Type = 'van'
                                and
                                byWeight.CarId <> cars.CarId
                        union all
                        select
                                byType.CarId
                            from
                                tblCars as cars
                            inner join
                                tblCars as byType
                            on
                                cars.Type = byType.Type
                            where
                                cars.Colour = 'black' and cars.Weight = 'light' and cars.Type = 'van'
                                and
                                byType.CarId <> cars.CarId
                    ) as matches
                on
                    cars.CarId = matches.CarId
                group by
                    cars.CarId,
                    cars.Colour,
                    cars.Weight,
                    cars.Type
        ) as results
    order by
        Relevance desc

Output:

CarId   Colour  Weight  Type    Relevance
1       black   light   van     3
3       white   light   van     2
4       blue    light   van     2
5       black   medium  van     2
6       white   medium  van     1
7       blue    medium  van     1
8       black   heavy   limo    1

Upvotes: 3

Tomalak
Tomalak

Reputation: 338326

Not overly efficient, but...

SELECT
  exact.car_id   AS e_car_id, exact.color   AS e_color, 
  exact.weight   AS e_weight, exact.type    AS e_type,
  related.car_id AS r_car_id, related.color AS r_color, 
  related.weight AS r_weight, related.type  AS r_type,
  CASE WHEN related.color = exact.color      THEN 1 ELSE 0 END
   + CASE WHEN related.weight = exact.weight THEN 1 ELSE 0 END
   + CASE WHEN related.type = exact.type     THEN 1 ELSE 0 END
  AS rank
FROM
  cars AS exact
  INNER JOIN cars AS related ON (
    related.car_id <> exact.car_id 
    AND CASE WHEN related.color = exact.color   THEN 1 ELSE 0 END
      + CASE WHEN related.weight = exact.weight THEN 1 ELSE 0 END
      + CASE WHEN related.type = exact.type     THEN 1 ELSE 0 END
    >= 1
  )
WHERE
  exact.car_id = 1 /* black, heavy, limo */
ORDER BY
  rank DESC

This would not run very fast on large data sets, since neither the JOIN nor the ORDER BY can make use of an index. Very probably a more optimal version exists.

Output on my test setup looks like this:

e_car_id  e_color  e_weight  e_type  r_car_id  r_color  r_weight  r_type  rank
1         black    heavy     limo    7         black    heavy     limo    3
1         black    heavy     limo    2         black    light     limo    2
1         black    heavy     limo    3         black    heavy     van     2
1         black    heavy     limo    4         black    medium    van     1
1         black    heavy     limo    5         blue     light     limo    1

Upvotes: 5

Related Questions