amoult
amoult

Reputation: 11

need faster alternative for FIND_IN_SET when Joining tables

I want to perform select if TABLE1 ID is found on TABLE2 comma separated column.

table1:
+---------+
| id      |
+---------+
|12345678 |
+---------+
|12322222 |
+---------+

table2:
+---------------------------------------------+
| manyids                                     |
+---------------------------------------------+
|12345678,1111111,2222233,0000111,65321       |
+---------------------------------------------+
|2222233,12322222                             |
+---------------------------------------------+
|12322222                                     |
+---------------------------------------------+

This is working fine on smaller test table:

SELECT table1.id,
COUNT(table1.id) AS occurences
FROM table1 JOIN table2 ON FIND_IN_SET(table1.id, table2.manyids ) > 0
GROUP BY table1.id HAVING occurences > 0 
ORDER BY occurences DESC

However actual TABLE1 I want to perform select has over 500k rows and FIND_IN_SET is just too slow. Any alternatives?

Upvotes: 1

Views: 6462

Answers (1)

Johan
Johan

Reputation: 76723

The only sensible alternative will be to normalize the tables:

Example

table tag
---------
id integer auto_increment primary key
name varchar(40)

table article
-------------
id integer auto_increment primary key
title varchar(1000)
content text

table tag_link
--------------
article_id integer foreign key references article(id)
tag_id integer foreign key references tag(id)
primary key article_id, tag_id

Because all the fields are indexed, you can query easily and very very fast like so:

SELECT t.name FROM article AS a
INNER JOIN tag_link tl ON (tl.article_id = a.id)
INNER JOIN tag t ON (t.id = tl.tag_id)
WHERE a.id = '45785'

Option 2 bad idea, much worse than option 1
If you really really cannot change the setup, create a fulltext index on field manyids.

And change the query to:

SELECT table1.id,
COUNT(table1.id) AS occurences
FROM table1 
JOIN table2 ON MATCH(table2.manyids) 
               AGAINST (CONCAT("+'",table1.id,"'") IN BOOLEAN MODE)  
/*boolean mode is required*/
GROUP BY table1.id HAVING occurences > 0 
ORDER BY occurences DESC

If any id is in the stopword list it will not match. Note that there are no numbers in this list.

Links
http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html
http://dev.mysql.com/doc/refman/5.5/en/fulltext-boolean.html

Note that you will need to tweak the min and max word length that the full-text index takes into account: see: http://dev.mysql.com/doc/refman/5.5/en/fulltext-fine-tuning.html

Upvotes: 6

Related Questions