Thomas Kekeisen
Thomas Kekeisen

Reputation: 4406

Is there a way to improve this query with if in it?

I use this query to select a language string from a database containing strings in many languages. The database looks like this:

`string_id`   BIGINT
`language_id` BIGINT
`datetime`    DATETIME
`text`        TEXT

For example, the data can look like this:

`string_id` | `language_id` | `datetime`          | `text`
1           | 1             | 2014.04.22 14:43:00 | hello world
1           | 2             | 2014.04.22 14:43:02 | hallo welt

So this is the same string in german and english. The german one was changed two seconds after the english one.

I juse this (sub)query to get the machting string. It automatically fallbacks to any language if the requested language does not exist. So for example, this query would fallback to english or german if I requst the string in spain (=id 3):

SELECT
    z.`text`
FROM
    `language_strings` AS z
WHERE
    a.`joined_string_id` = z.`string_id` 
ORDER BY
    IF(z.`language_id` = 3, 1, 0) DESC,
    z.`datetime` DESC
LIMIT
    1

The performance issue here is, that the IF(..., 1, 0) removes a lot of opportunities because the result has to be calculated every time the query is executed.

I tried a lot to improve this query, all useful indexes are still created. MySQL is able to hit this query with the internal cache, but without cache it takes some time to calculate. This is a performance issue when getting a lot of rows (e.g. 1000) because MySQL has to perform 1000 subquerys.

Do you have an idea how to improve this query? Adding new columns to store precalculated data would be an option for me.

Upvotes: 2

Views: 91

Answers (4)

Thomas Kekeisen
Thomas Kekeisen

Reputation: 4406

While I tested all the posted solution and got kinda headache of the complexity of them, I thought there must be a better way to do this. Inspirated by the COALESCE from @Twinkles that I diddn't know before I decided to try the same code with using another, "temporary" table that definitly contains every possible solution.

This little query generates that table and gurantees that there is definitelya entry for every language:

INSERT INTO
    `language_strings_compiled`
(
    `string_id`,
    `language_id`,
    `text`
)
SELECT
    a.`string_id`,
    b.`language_id`,
    (
        SELECT
            z.`text`
        FROM
            `language_strings` AS z
        WHERE
            a.`string_id` = z.`string_id`
        ORDER BY
            IF(z.`language_id` = b.`language_id`, 1, 0) DESC,
            z.`datetime` DESC
        LIMIT 1
    ) AS `text`
FROM
    `language_strings` AS a
JOIN
    `languages` AS b
GROUP BY
    a.`string_id`,
    b.`language_id`

And then, my subquery can look like this:

COALESCE
(
    (
        SELECT
            z.`text`
        FROM
            `language_strings_compiled` AS z
        WHERE
            a.`joined_string_id` = z.`string_id`
        AND
            z.`language_id` = 3
        LIMIT
            1
    ),
    (
        SELECT
            z.`text`
        FROM
            `language_strings` AS z
        WHERE
            a.`joined_string_id` = z.`string_id`
        ORDER BY
            IF(z.`language_id` = 3, 1, 0) DESC,
            z.`datetime` DESC
        LIMIT
            1
    )
)

This solution is 10 times faster than the solution without the "compiled" table. And it is able to fallback to the "old" solution if there are some new language strings that are not known by the compiled table at all.

Thanks for all the solution, I tried them all but everytime I ran into the "sub-sub-query"-problem so far.

Upvotes: 0

Kickstart
Kickstart

Reputation: 21523

This appears to be a correlated sub query, which assuming there are a fair number of rows on the table a would be quite inefficient. Might be better to recode this as joined sub queries.

Maybe as follows:-

SELECT a.*, IFNULL(ls1.`text`, ls2.`text`)
FROM some_table a
LEFT OUTER JOIN 
(
    SELECT string_id, MAX(datetime) AS MaxDateTime
    FROM language_strings
    WHERE language_id = 3
    GROUP BY string_id
) AS MainLanguage1
ON a.joined_string_id = MainLanguage1.string_id
LEFT OUTER JOIN language_strings ls1
ON MainLanguage1.string_id = ls1.string_id AND MainLanguage1.datetime = ls1.MaxDateTime
LEFT OUTER JOIN 
(
    SELECT string_id, MAX(datetime)
    FROM language_strings
    WHERE language_id != 3
    GROUP BY string_id
) AS MainLanguage2
ON a.joined_string_id = MainLanguage2.string_id
LEFT OUTER JOIN language_strings ls2
ON MainLanguage2.string_id = ls2.string_id AND MainLanguage2.datetime = ls2.MaxDateTime

This gets the latest date for a string_id where the language is 3, and then a join to get the matching text to go with it, and the latest date for a a string_id where the language is not 3 and then a join to get the matching text to go with that.

Then the text that is returned is just brought back using IFNULL to bring back the text for language 3, and if not found then the text for languages other than 3.

Upvotes: 1

Twinkles
Twinkles

Reputation: 1994

SELECT COALESCE(primary.`text`,fallback.`text`)
FROM (
  SELECT 1 `ord`, z.`text`, z.`datetime`
  FROM `language_strings` AS z
  WHERE z.`language_id` = 3
) primary
FULL OUTER JOIN
(
  SELECT 2 `ord`, z.`text`, z.`datetime`
  FROM `language_strings` AS z
) fallback
ON (primary.`string_id` = fallback.`string_id`
    AND primary.`string_id` = a.`joined_string_id`)
ORDER BY `ord` ASC, `datetime` DESC
LIMIT 1

Upvotes: 1

StanislavL
StanislavL

Reputation: 57381

(SELECT
    1 as ord, z.`text`
FROM
    `language_strings` AS z
WHERE
    a.`joined_string_id` = z.`string_id` and z.`language_id` = 3
limit 1)
union all
(SELECT
    2 as ord, z.`text`
FROM
    `language_strings` AS z
WHERE
    a.`joined_string_id` = z.`string_id`
ORDER BY
    z.`datetime` DESC
LIMIT 1)
ORDER BY ord
LIMIT 1

Updated. Twinkles thank you for the note.

Upvotes: 1

Related Questions