Reputation: 2110
Say I have the following basic MySQL data:
CREATE TABLE my_words (my_word VARCHAR(255));
INSERT INTO my_words VALUES ('dog');
INSERT INTO my_words VALUES ('cat');
INSERT INTO my_words VALUES ('tree');
INSERT INTO my_words VALUES ('ball');
INSERT INTO my_words VALUES ('life');
INSERT INTO my_words VALUES ('complex');
INSERT INTO my_words VALUES ('digeridoo');
INSERT INTO my_words VALUES ('hamster');
INSERT INTO my_words VALUES ('it');
INSERT INTO my_words VALUES ('house');
INSERT INTO my_words VALUES ('love');
INSERT INTO my_words VALUES ('zealous');
INSERT INTO my_words VALUES ('nevis');
INSERT INTO my_words VALUES ('mountain');
INSERT INTO my_words VALUES ('call');
INSERT INTO my_words VALUES ('nail');
INSERT INTO my_words VALUES ('rat');
INSERT INTO my_words VALUES ('hat');
SELECT CONCAT(w1.my_word, w2.my_word) joined
FROM my_words w1, my_words w2
WHERE LENGTH(CONCAT(w1.my_word, w2.my_word)) = 8
ORDER BY RAND() LIMIT 5;
I can write the SQL statement at the end to generate a list of 5 random concatenated strings made up of 2 words, where the total length of the string is 8 characters.
That works fine for a simple table of data like I've got in the sample.
However, the "real" table I am working with contains about 6,200 lines.
If I try the same type of statement it takes 10 seconds to generate the 5 strings.
I guess the SQL is very inefficient as it is searching through the table twice each time and those tables aren't joined in any way.
I wondered if there is a simpler way to extract strings of words made up of 2 words from the table, where the length of the concatenated string is 8 characters long (though that can change - I'm just using 8 as an example).
Thanks
Update 1
Explain Plan of:
EXPLAIN
SELECT CONCAT(w1.fld_un, w2.fld_un) joined
FROM j_un w1
JOIN j_un w2 ON w1.fld_len = 8 - w2.fld_len
WHERE w2.fld_len < 8
ORDER BY RAND()
LIMIT 5;
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE w2 range un_len un_len 5 \N 2694 Using where; Using temporary; Using filesort
1 SIMPLE w1 ref un_len un_len 5 func 527 Using where
Update 2
I'm not sure if it's relevant, but the "fld_un" table has about 6,200 rows.
The "word" is held in the "fld_un" column.
The structure of the table is:
Field Type Null Key Default Extra
fld_id int(11) NO PRI NULL auto_increment
fld_un varchar(255) YES NULL
fld_cat_id int(11) YES MUL NULL
fld_len int(2) NO MUL NULL
These indexes exist on the table:
Keyname Type Cardinality Field
PRIMARY PRIMARY 6318 fld_id
cat INDEX 15 fld_cat_id
bob INDEX 11 fld_len
Does it matter that there is already a primary index on the table? I don't technically need that I suppose.
Statement:
SELECT CONCAT(word1, word2) joined
FROM (
SELECT w1.fld_un word1, w2.fld_un word2
FROM j_un2 w1
JOIN j_un2 w2 ON w1.fld_len = 8 - w2.fld_len
WHERE w2.fld_len < 8
ORDER BY RAND()
LIMIT 5) x;
Query took 23.6805 sec
Explain Plan:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5
2 DERIVED w2 range bob bob 4 NULL 4627 Using where; Using temporary; Using filesort
2 DERIVED w1 ref bob bob 4 func 527 Using where
When I revise the "bob" index to include 2 columns as suggested by Thorsten Kettner:
Keyname Type Cardinality Field
bob INDEX 11 fld_len, fld_un
And re-test:
SELECT CONCAT(word1, word2) joined
FROM (
SELECT w1.fld_un word1, w2.fld_un word2
FROM j_un2 w1
JOIN j_un2 w2 ON w1.fld_len = 8 - w2.fld_len
WHERE w2.fld_len < 8
ORDER BY RAND()
LIMIT 5) x;
The query took 30.3394 sec to return 5 rows.
Explain Plan:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5
2 DERIVED w2 range bob bob 4 NULL 4211 Using where; Using temporary; Using filesort
2 DERIVED w1 ref bob bob 4 func 527 Using where
Update 3
Ran without "order by rand()" and it ran in 0.0011 sec!
Upvotes: 0
Views: 2043
Reputation: 780974
You can add a column, e.g. word_length
that contains the length of the word, and add an index on the word_length
column. Normally it would be poor design to include data that can be derived from another column, but in this case you need to break purity for performance sake. Then your query can use a JOIN
condition using this column:
SELECT CONCAT(w1.my_word, w2.my_word) joined
FROM my_words w1
JOIN my_words w2 ON w1.word_length = 8 - w2.word_length
WHERE w2.word_length < 8
ORDER BY RAND()
LIMIT 5
You can use INSERT
and UPDATE
triggers to fill in the word_length
column automatically.
It might also be helpful to do the concatenation after filtering down to the 5 rows:
SELECT CONCAT(word1, word2) joined
FROM (
SELECT w1.my_word word1, w2.my_word word2
FROM my_words w1
JOIN my_words w2 ON w1.word_length = 8 - w2.word_length
WHERE w2.word_length < 8
ORDER BY RAND()
LIMIT 5) x
Upvotes: 4
Reputation: 4549
If you can add another column and index to index the length of the word, you can do a join ON clause with the second Word length = desired length - first word length.
I think the real slow part here though is the rand(). You might get better results if you don't do that on the cartesian product, but on each table individually. Although the results are different if you do that...
Upvotes: 0