Reputation: 127
I have the following MySQL query:
SELECT value_1, (
SELECT value_4
FROM table_1
WHERE value_3 < value_1
ORDER BY value_3 DESC
LIMIT 1
)
AS result_value
FROM table_2
WHERE value_1 BETWEEN 1325372400000 AND 1328050800000
ORDER BY value_1
Which returns 32 results and takes 6.6 seconds to run. The idea is to get the entries from table_1 where value_3 is "closest" to value_1 in table_2.
The two queries the query is comprised of, namely
SELECT value_1
AS result_value
FROM table_2
WHERE value_1 BETWEEN 1325372400000 AND 1328050800000
ORDER BY value_1
and (for example)
SELECT value_4
FROM table_1
WHERE value_3 < 1328050800000
ORDER BY value_3 DESC
LIMIT 1
each take 0.03 seconds to run. Computing the accumulated time for 32 result sets, the compound query should take no longer than 1 second, or probably even less (because I/O overhead for individual queries is not accounted for). Yet at 6.6 seconds, it takes much longer.
Why is that, how can I optimize it? Or is there another/superior way to reach my goal?
Update:
Table definitions:
table_1 (MyISAM, 700000 entries):
'id', 'int(10) unsigned', 'NO', 'PRI', NULL, 'auto_increment'
'value_3', 'bigint(20) unsigned', 'NO', 'UNI', '0', ''
'value_4', 'bigint(20) unsigned', 'NO', '', '0', ''
table_2 (MyISAM, 4000 entries):
'value_1', 'bigint(20) unsigned', 'NO', 'PRI', NULL, ''
EXPLAIN [query]:
'1', 'PRIMARY', 'table_2', 'range', 'PRIMARY,value_3_UNIQUE', 'PRIMARY', '8', NULL, '32', 'Using where; Using index'
'2', 'DEPENDENT SUBQUERY', 'table_1', 'index', 'value_3,value_3_value_4', 'value_3', '8', NULL, '1', 'Using where'
Upvotes: 2
Views: 130
Reputation: 115660
If you have an index on value_1
in table_2
and a compound (value3, value4)
index on table_1
, then only the indices will be used by the query.
You could also try this query:
SELECT value_1
, value_4 AS result_value
FROM table_2
JOIN table_1
ON table_1.value_3 =
( SELECT value_3
FROM table_1
WHERE value_3 < value_1
ORDER BY value_3 DESC
LIMIT 1
)
WHERE value_1 BETWEEN 1325372400000 AND 1328050800000
ORDER BY value_1
Upvotes: 1
Reputation: 425448
Use a bit of mysql kung fu:
SELECT * from
(SELECT value_1, value_4
FROM table_2
join table_1 on value_3 < value_1
WHERE value_1 BETWEEN 1325372400000 AND 1328050800000
ORDER BY value_1, value_3 DESC) x
GROUP BY value_1
ORDER BY value_1
The "kung fu" is that with mysql when the columns you're not grouping by are not aggregated (eg SUM()
etc), you get the first row encountered for each group. If you use this technique on an ordered result set, you can get the value you're after.
Not only is this easier to code, but you'll notice that just one pass is made over the tables (not one query per row as you tried). It should perform well.
There has been conjecture by some commenters that this special form of GROUP BY
is "indeterminate" and/or "not officially supported" etc. The documentation states that the row selected in "indeterminate", however I have never seen nor heard of mysql selecting any row other than the first encountered and using an order by
on the inner select is used and relied upon in countless production queries.
FWIW, I am happy to recommend this approach as being "reliable" and production-worthy.
Upvotes: 1
Reputation: 9320
Here Iam just avoiding use of virtual(derived) table. The main reason behind virtual table usage was sorting which has been achieved by applying order by clause within the groupcconcat().
SELECT SUBSTRING_INDEX(group_concat(value_1
ORDER BY value_1, value_3 DESC),',',1) as value_1,
SUBSTRING_INDEX(group_concat(value_4
ORDER BY value_1, value_3 DESC),',',1) as value_4
FROM table_2 join table_1 on value_3 < value_1
WHERE value_1 BETWEEN 1325372400000 AND 1328050800000
GROUP BY value_1
ORDER BY value_1
Upvotes: 0