Reputation: 287
I'm running the exact same query four times, twice as a subquery, gathering different information each time. What is the best way to pass the results of the first query to the other three so it doesn't have to run three more times?
On the average, it returns around 2,000 rows, but can be anywhere from 0 (in which case I skip the other three) to all. The primary table has nearly 300,000 rows, is growing by about 800 per day, rows are never deleted, and thousands of rows are updated throughout the day, many multiple times.
I looked into query cache, but it doesn't look like it has a bright future:
disabled-by-default since MySQL 5.6 / MariaDB 10.1.7
depreciated as of MySQL 5.7.20
removed in MySQL 8.0
I considered using GROUP_CONCAT with IN, but somehow I doubt that would work very well (if at all) with larger queries.
This is in a library I use to format the results for other scripts, so the original query can be nearly anything. Usually, it is on indexed columns, but can be horribly complicated using stored functions and take several minutes. It always involves the primary table, but may also join in other tables (but only to filter results from the primary table).
I am using Perl 5.16 and MariaDB 10.1.32 (will upgrade to 10.2 shortly) on CentOS 7. I am using prepare_cached and placeholders. The user this library runs as has SELECT-only access to tables plus EXECUTE on a couple stored functions, but I can change that if needed.
I've minimized the below as much as I can and used metasyntactic variables (inside angle brackets) as much as possible in an attempt to make the logic clear. id
is 16 bytes and the primary key of the primary table (labeled a
below).
I'm accepting three parameters as input. <tables>
always includes a
and may include a join like a join b on a.id=b.id
. <where>
can be simple like e=3
or horribly complex. I'm also getting an array of data for the placeholders, but I've left that out of the below because it doesn't affect the logic.
<search> = FROM <tables> WHERE (<where>)
<foo> = k < NOW() - INTERVAL 3 HOUR
<bar> = j IS NOT NULL OR <foo>
<baz> = j IS NULL AND k > NOW() - INTERVAL 3 HOUR
so <baz> is !<bar>. Every row should match one or the other
<where> often includes 1 or more of foo/bar/baz
SELECT a.id, b, c, d, <foo> x <search> ORDER BY e, id
SELECT COUNT(*) <search> AND <baz>
I really only need to know if any of the above rows match <baz>
SELECT c, COUNT(*) t, SUM(<bar>) o FROM a WHERE c IN (SELECT c <search> GROUP BY c) GROUP BY c
SELECT d, COUNT(*) t, SUM(<bar>) o FROM a WHERE d IN (SELECT d <search> GROUP BY d) GROUP BY d
The last two get a list of all unique c
or d
from the rows in the original query and then count how many total rows (not just the ones in the original query) have matching c
or d
and how many of those match <bar>
. Those results get dumped into hashes so I can look up those counts while I iterate through the rows from the original query. I'm thinking running those two queries once is more efficient than running two smaller queries for each row.
Thank you.
Edited to add solution:
A temporary table was the answer, just not quite in the way Raymond suggested. Using EXPLAIN on my queries indicates that MariaDB was already using a temporary table for each, and deleting it when each was complete.
An inner join only returns rows that exist in both tables. So by making a temporary table of IDs that match my first SELECT, and then joining it to the primary table for the other SELECTs, I only get the data I want, without having to copy all that data to the temporary table.
"To create a temporary table, you must have the CREATE TEMPORARY TABLES privilege. After a session has created a temporary table, the server performs no further privilege checks on the table. The creating session can perform any operation on the table, such as DROP TABLE, INSERT, UPDATE, or SELECT." - https://dev.mysql.com/doc/refman/5.7/en/create-temporary-table.html
I also figured out that GROUP BY sorts by default, and you can get better performance if you don't need the data sorted by telling it not to.
DROP TEMPORARY TABLE IF EXISTS `temp`;
CREATE TEMPORARY TABLE temp AS ( SELECT a.id FROM <tables> WHERE <where> );
SELECT a.id, b, c, d, <foo> x FROM a JOIN temp ON a.id=temp.id ORDER BY e, id;
SELECT COUNT(*) FROM a JOIN temp WHERE <baz>;
SELECT c, COUNT(*) t, SUM(<bar>) o FROM a WHERE c IN (SELECT c FROM a JOIN temp GROUP BY c ORDER BY NULL) GROUP BY c ORDER BY NULL;
SELECT d, COUNT(*) t, SUM(<bar>) o FROM a WHERE d IN (SELECT d FROM a JOIN temp GROUP BY d ORDER BY NULL) GROUP BY d ORDER BY NULL;
DROP TEMPORARY TABLE IF EXISTS `temp`;
Upvotes: 0
Views: 1467
Reputation: 11602
The best i could think of is by using a TEMPORARY table.
p.s iám using valid MySQL SQL code mixed with the same pseudo code as the topicstarter
CREATE TEMPORARY TABLE <name> AS ( SELECT FROM <tables> WHERE (<where>) )
<foo> = k < NOW() - INTERVAL 3 HOUR
<bar> = j IS NOT NULL OR <foo>
<baz> = j IS NULL AND k > NOW() - INTERVAL 3 HOUR
so <baz> is !<bar>. Every row should match one or the other
<where> often includes 1 or more of foo/bar/baz
SELECT a.id, b, c, d, <foo> x FROM <name> ORDER BY e, id
SELECT COUNT(*) FROM <name> WHERE <baz>
SELECT c, COUNT(*) t, SUM(<bar>) o FROM a WHERE c IN (SELECT c FROM <name> GROUP BY c) GROUP BY c
SELECT d, COUNT(*) t, SUM(<bar>) o FROM a WHERE d IN (SELECT d FROM <name> GROUP BY d) GROUP BY d
Upvotes: 2