Reputation: 818
I have two huge tables and need the following query be to executed. Both table authors_1
(~20M rows) and authors_2
(~120M rows) have the same structure. I've done my researches and came up with the following query and table structures. It still takes a long time (usually between 10 to 20 seconds) for the query.
Here is the query:
SELECT `fname`, `lname`
FROM (
SELECT `fname`, `lname`
FROM `authors_1`
WHERE 1 AND `lname` LIKE 'AR%'
UNION ALL
SELECT `fname`, `lname`
FROM `authors_2`
WHERE 1 AND `lname` LIKE 'AR%') `a`
GROUP BY CONCAT(`fname`, `lname`)
ORDER BY `lname`
LIMIT 0, 999;
and this is the structure, which is similar for both tables (The FT
indexes are for other queries).
CREATE TABLE `scipers_authors` (
`a_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`linker` varchar(255) COLLATE utf8_persian_ci NOT NULL,
`fname` tinytext COLLATE utf8_persian_ci NOT NULL, /*Should this be tinytext because of FT index or I can use VARCHAR(255) while having FT index?*/
`lname` tinytext COLLATE utf8_persian_ci NOT NULL, /*Same for this one*/
PRIMARY KEY (`a_id`),
UNIQUE KEY `linker` (`linker`) USING BTREE,
KEY `lname_4` (`lname`(4)) USING BTREE,
KEY `name` (`lname`(128),`fname`(128)) USING BTREE,
FULLTEXT KEY `fname` (`fname`),
FULLTEXT KEY `lname` (`lname`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;
This is the EXPLAIN
output for the example query shown above (I don't know how to make it beautiful like others, I'm using PMA):
1 PRIMARY <derived2> ALL 476968 Using temporary; Using filesort
2 DERIVED authors_1 range lname_4,name,lname name 386 184800 Using where
3 UNION authors_2 range lname_4,name,lname name 386 292168 Using where
UNION RESULT <union2,3> ALL Using temporary
Any suggestion for improving this query/structure?
Upvotes: 1
Views: 952
Reputation: 781068
Try changing GROUP BY CONCAT(fname, lname)
to GROUP BY fname, lname
. Doing all those concatenations is extra work that's unnecessary.
The only semantic difference between these is if there are two people whose first and last names are different, but they're the same when concatenated, e.g. Freda Smith and Fred Asmith. Such combinations are probably unlikely, and I doubt you really wanted them to be combined into a single result in the first place.
And if you're grouping by the same set of columns you're selecting, you can simply use
SELECT DISTINCT fname, lname
You could instead use UNION DISTINCT
instead of UNION ALL
, since you want duplicates to be removed. Then you don't need to do grouping or DISTINCT
in the outer query.
Upvotes: 3