Denees
Denees

Reputation: 9198

One field search for multiple values

What is the best and optimised way to write a SQL query to search for a term with multiple columns in a table?

For example I have a table of products:

id | title   | color_id
-------------------
 1 | Dress   | 1 (red)
 2 | T-shirt | 3 (blue)
 3 | Pants   | 2 (green)
 4 | Socks   | 1 (red)
 5 | Dress   | 2 (green)
 6 | Shoes   | 2 (green)
 7 | Pants   | 3 (blue)

And a table of colors:

id | color
----------
 1 | Red
 2 | Green
 3 | Blue

And if a user enters in the term Red dress, as a result (s)he must see Product with id 1, and if user enters simply Red, as a result (s)he must see Products with id 1 and 4.

Update: There may be some inputs like dress red or red blue too.

Real version of tables are more complex, but I tried to explain in a simplest way.

Upvotes: 4

Views: 350

Answers (3)

fthiella
fthiella

Reputation: 49089

My idea is to use a query like this:

set @search = 'red dress';

SELECT *
FROM
  products INNER JOIN colors
  ON products.color_id = colors.id
WHERE
  (FIND_IN_SET(title, REPLACE(@search, ' ', ','))>0)+
  (FIND_IN_SET(color, REPLACE(@search, ' ', ','))>0) =
  LENGTH(@search)-LENGTH(REPLACE(@search, ' ', ''))+1;

The two replaces inside FIND_IN_SET are used to make a list of properties separated by commas:

red,dress

I'm then checking if title is in this set. If it is present, the value of:

(FIND_IN_SET(title, REPLACE(@search, ' ', ','))>0)

will be evaluated to 1, 0 otherwise. The same is for color.

The number of the properties in the search string can be calculated as:

LENGTH(@search)-LENGTH(REPLACE(@search, ' ', ''))+1

(yes, it is a dirty trick!). If the number of the properties that matches is the same as the properties in the @search string, the row will be returned. Please notice that performances will be poor.

Fiddle is here.

Upvotes: 1

RandomSeed
RandomSeed

Reputation: 29809

A simple solution that just "works" could be:

SELECT *
FROM products
JOIN colors ON colors.id = products.color_id
WHERE 
    ( title = "red" AND color = "dress" ) OR
    ( color = "red" AND title = "dress" )

This condition might perform better, if the optimiser is dumb enough not to notice it by itself:

WHERE 
    ( title = "red" OR title = "dress") AND
    ( color = "red" OR color = "dress")

If you add more properties to the problem (besides "title" and "color"), and if and if storage is not an issue, you may want to merge (and duplicate) all textual properties into a single VARCHAR column, and run a fulltext search on this column.

CREATE TABLE products_properties (
    product_id INT NOT NULL PRIMARY KEY,
    properties VARCHAR (255),
    FOREIGN KEY fk_product (product_id) REFERENCES products(id),
    FULLTEXT ftx_properties (properties)
);

The search becomes very simple:

SELECT products.*, colors.*
FROM products
JOIN colors ON colors.id = products.color_id
JOIN products_properties AS pp ON pp.product_id = products.id
WHERE MATCH(properties) AGAINST ("+red +dress")

This obviously makes no sense in this particular example, but the more properties, the more it will speed up the query. This denormalisation also comes at the cost of an increased complexity to maintain the products_properties table.

Now the problem becomes really hairy if you

  • have to deal with less normalised inputs such as "red blue"
  • have words that belong to both categories (eg. if "red" was also a valid cloth title) and still want to discriminate them (eg. a query like "red blue" returns a cloth named "red" with color "blue")

But it looks like this is outside of the scope of your question.

Upvotes: 2

Eric Petroelje
Eric Petroelje

Reputation: 60539

If I were doing this in SQL, I would typically break down the query into individual words before doing the query, then build the query dynamically based on how many words there were.

So for your example, the query might end up looking like this:

SELECT * 
FROM products p
JOIN colors c ON c.id = p.color_id
WHERE 
p.title LIKE '%red%' OR
c.color LIKE '%red%' OR
p.title LIKE '%dress%' OR
c.color LIKE '%dress%'

If you have lots of tables though, this can start to get pretty complicated. It's also not very efficient as its unlikely any indexes will be usable.

A better solution would be to use a dedicated text indexing product like Lucene (but that's a whole nother question..)

Upvotes: 1

Related Questions