Reputation: 989
I have two large tables I have to join
First one:
CREATE TABLE IF NOT EXISTS `cdr` (
`calldate` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`clid` varchar(80) NOT NULL DEFAULT '',
`src` varchar(80) NOT NULL DEFAULT '',
`dst` varchar(80) NOT NULL DEFAULT '',
`dcontext` varchar(80) NOT NULL DEFAULT '',
`channel` varchar(80) NOT NULL DEFAULT '',
`dstchannel` varchar(80) NOT NULL DEFAULT '',
`lastapp` varchar(80) NOT NULL DEFAULT '',
`lastdata` varchar(80) NOT NULL DEFAULT '',
`duration` decimal(11,6) NOT NULL DEFAULT '0.000000',
`billsec` decimal(11,6) NOT NULL DEFAULT '0.000000',
`disposition` varchar(45) NOT NULL DEFAULT '',
`amaflags` int(11) NOT NULL DEFAULT '0',
`accountcode` varchar(20) NOT NULL DEFAULT '',
`uniqueid` varchar(32) NOT NULL DEFAULT '',
`userfield` varchar(255) NOT NULL DEFAULT '',
`cost` char(20) NOT NULL DEFAULT 'none',
`zone` char(60) NOT NULL DEFAULT 'none',
`profile` char(10) NOT NULL DEFAULT 'none',
`tariff` char(8) NOT NULL DEFAULT 'none',
`status` char(10) NOT NULL DEFAULT 'none',
`answer` datetime NOT NULL,
`end` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
--
-- Indexes for table `cdr`
--
ALTER TABLE `cdr`
ADD KEY `src` (`src`),
ADD KEY `accountcode` (`accountcode`),
ADD KEY `status` (`status`),
ADD KEY `uniqueid` (`uniqueid`),
ADD KEY `calldate` (`calldate`);
Second one
CREATE TABLE IF NOT EXISTS `routes` (
`id` int(4) NOT NULL,
`route` char(35) NOT NULL,
`zonenum` int(4) NOT NULL,
`comment` char(50) CHARACTER SET latin2 DEFAULT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1',
`wholesaledst` varchar(60) NOT NULL,
`nabava` char(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `routes`
ADD PRIMARY KEY (`id`), ADD UNIQUE KEY `route` (`route`), ADD KEY `zonenum` (`zonenum`);
First table contains around 3.5 millions row and second one around 35 000. For every record in first table I have to get zonenum from second table.
Here is my query
SELECT src, accountcode, zonenum, calldate, answer, end
FROM cdr
LEFT OUTER JOIN routes ON src LIKE route
WHERE calldate BETWEEN '2015-03-01' AND '2015-04-01' AND status = 'INCOMING' AND accountcode != 110 AND disposition = 'ANSWERED';
For example src looks something like 095346435 and route looks like 095%
Query need around 7 mins to execute. If I remove join it's only 1.5 second.
When checking mysql slow query log it says that query has examined 1.4 million rows, which is circa number of rows after where clause * number of rows in routes
table. I've tried using subquery, temporary table.. everything but it's always way too slow
Is there any way I can speed up query? Or did I missed some index? Please help I'm desperate.
UPDATE
Here is EXPLAIN
result if it helps
Upvotes: 1
Views: 1259
Reputation: 15961
Since routes
is so much smaller, it might be worth JOINing to a subquery containing it instead. In that query, you could SELECT LEFT(route, 3) AS rtPre, zonenum
and join to rtPre, like so:
SELECT cdr.src, cdr.accountcode, r.zonenum, cdr.calldate, cdr.answer, cdr.end
FROM cdr
LEFT JOIN (
SELECT LEFT(route, 3) AS rtPre, zonenum
FROM routes
) AS r ON LEFT(cdr.src, 3) = r.rtPre
WHERE cdr.calldate BETWEEN '2015-03-01' AND '2015-04-01'
AND cdr.status = 'INCOMING'
AND cdr.accountcode != 110
AND cdr.disposition = 'ANSWERED'
;
If that still doesn't help, you could INSERT that subquery into a temporary table, with an index on rtPre
; and use that table in the JOIN.
If this kind of query is going to be run frequently, you might even want to consider a permanent "prefix" field in routes
that could be indexed.
...Of course, this made the huge assumption that all the cdr.src values would be 3 characters and a %. (If that was a bad assumption, a prefix type solution might still be usable. LEFT(cdr.src, [standard prefix length]) = r.rtPre AND r.route LIKE cdr.src
could take advantage of a minimum prefix to reduce the LIKE
comparisons needed.
Upvotes: 2
Reputation: 5336
Your query is not using any keys for lookup in the route table. index in the EXPLAIN means index scan is performed, it is still bad because we have to look at every record.
Note that src LIKE route
cannot use a key. As CindyH pointed out, src = route
would be better. Of course it will give different results. Did you really need src LIKE route
in your logic, or would = be good enough?
If you do need LIKE, the solution will depend on why, but it will involve building some form of a FULLTEXT index on route, either with native FULLTEXT or some manual construction using your own table of relevant substrings. Or perhaps some other creative solution.
Edit:
Based on the additional info, and assuming that all routes are just prefixes, I propose the following "creative" solution:
src LIKE route
with concat(whatever computes the prefix, '%') = route
this will use the key on route.
Upvotes: 1
Reputation: 48179
Learn about building indexes that best match what you are asking for. You need a compound index (sometimes also COVERING indexes) depending on how it works best for your tables / situations.
Since you are looking at your call-data-record for specific dates and status, I would suggest an index on
cdr table index ( disposition, status, calldate, accountcode )
a covering index to include your JOIN criteria would include the SRC column too such as...
cdr table index ( disposition, status, calldate, accountcode, src )
Think of indexes as a way to optimize sorting. If you have a two rooms and each room has a copy of every call data record.
Room A is sorted by only 1 column, such as the account code, you have to go to each group of account, then find for the status, zone, date range.
Room B, has it multi-sorted per the sample index above. So, in the room, assume there are file cabinets. Each cabinet has its own disposition, so you jump directly to those "ANSWERED" and ignore the others.
In that cabinet, are two drawers... INCOMING and OUTGOING.... So now you are down to 1 drawer.
From that drawer, they are sorted by date. So now you can jump directly to the date range you want the qualified records from. Then its a matter of SKIPPING OVER all those that account code 110...
That is a premise of how to THINK about how the indexes work. Never just rely on single columns across the board, you need them to work in conjunction to the others based on your queries.
Since you need to get the ZONE Number from the routes table, that should have a COVERING index on
index for the ROUTES table ... ( route, zonenum )
so the engine does not have to go back to the raw data pages to get the zone. It is part of the index and can return that directly. Again, don't rely on a single column for the index if you need the second part too.
Upvotes: 0