paschdan
paschdan

Reputation: 29

Optimize SQL-Query that is using REGEXP in a JOIN

I have the following situation:

Table Words:

| ID |   WORD |
|----|--------|
|  1 |     us |
|  2 |     to |
|  3 | belong |
|  4 |    are |
|  5 |   base |
|  6 |   your |
|  7 |    all |
|  8 |     is |
|  9 |  yours |

Table Sentence:

| ID |                                  SENTENCE |
|----|-------------------------------------------|
|  1 | <<7>> <<6>> <<5>> <<4>> <<3>> <<2>> <<1>> |
|  2 |                         <<7>> <<8>> <<9>> |

And i want to replace the <<(\d)>> with the equivalent word from the Word-Table.

So the result should be

| ID |                       SENTENCE |
|----|--------------------------------|
|  1 | all your base are belong to us |
|  2 |                   all is yours |

What i came up with is the following SQL-Code:

SELECT id, GROUP_CONCAT(word ORDER BY pos SEPARATOR ' ') AS sentence FROM (
    SELECT sentence.id, words.word, LOCATE(words.id, sentence.sentence) AS pos
    FROM sentence
    LEFT JOIN words
    ON (sentence.sentence REGEXP CONCAT('<<',words.id,'>>'))
    ) AS TEMP
GROUP BY id

I made a sqlfiddle for this:

http://sqlfiddle.com/#!2/634b8/4

The code basically is working, but i'd like to ask you pros if there is a way without a derived table or without filesort in the execution plan.

Upvotes: 2

Views: 153

Answers (1)

Nanne
Nanne

Reputation: 64429

You should make a table with one entry per word, so your sentense (sic) can be made by joining on that table. It would look something like this

SentenceId, wordId, location
2,          7,       1
2,          8,       2
2,          9,       3

They way you have it set up, you are not taking advantage of your database, basically putting several points of data in 1 table-field.

The location field (it is tempting to call it "order", but as this is an SQL keyword, don't do it, you'll hate yourself) can be used to 'sort' the sentence.

(and you might want to rename sentense to sentence?)

Upvotes: 3

Related Questions