Svetoslav
Svetoslav

Reputation: 4686

How to make MYSQL search for minimum 3 keyword matches?

Hi I have one big problem with one MYSQL search. My database TABLE looks like this :

+------+----------+------+
|  id  | keywords | file |
+------+----------+------+

At keywords there are many keywords for each entry seperated with comas. (keyword1,keyword2...).

At PHP array there are listed some keywords (5-10). And my search must get all DB entries which got atleast 3 from those keywords. Its not required to got all of those words! But it can't work and with just one.

Can somebody help me with that query? I don't got good idea how to make it.

Upvotes: 0

Views: 259

Answers (2)

unutbu
unutbu

Reputation: 880309

If you had this setup:

table files:

+------+-------+
| id   | file  |
+------+-------+
| 1000 | foo   |
| 1001 | bar   |
+------+-------+

table keywords:

+----+-------+
| id | word  |
+----+-------+
|  9 | lorem |
| 10 | ipsum |
+----+-------+

table filekeywords:

+----+--------+--------+
| id | fileid | wordid |
+----+--------+--------+
|  1 | 1000   | 9      |
|  2 | 1000   | 10     |
|  3 | 1001   | 10     |
+----+--------+--------+

You could find files with keywords lorem, ipsum, dolor like this:

SELECT COUNT(DISTINCT(k.word)), f.*
FROM files f
INNER JOIN filekeywords fk
  ON fk.fileid = f.id
INNER JOIN keywords k
  ON k.id = fk.wordid
WHERE k.word in ('lorem', 'ipsum', 'dolor')  
GROUP BY f.id
HAVING COUNT(DISTINCT(k.word)) >= 3

Upvotes: 0

Brian Hoover
Brian Hoover

Reputation: 7991

That's a challenge. The brute force method would be to use a UNION in a subquery with a count.

For example,

select id, file, count(*) from
  (select distinct id, file
     from file_table
       where FIND_IN_SET(keyword1, keywords)
   UNION ALL
   select distinct id, file
     from file_table
       where FIND_IN_SET(keyword2, keywords)
   UNION ALL
   select distinct id, file
     from file_table
       where FIND_IN_SET(keyword3, keywords)
   UNION ALL
   select distinct id, file
     from file_table
       where FIND_IN_SET(keyword4, keywords)
   .... MORE UNION ALL ....) as files
group by id, file
having count(*) >= 3

More efficiently, you could have a separate table with keywords and ID, one keyword/ID combo per row. This would eliminate the wildcard search and make the query more efficient.

The next step would be to go to something like ElasticSearch and filter on the score of the result.

Upvotes: 2

Related Questions