Ryan Doherty
Ryan Doherty

Reputation: 38740

MySQL left outer join is slow

hoping to get some help with this query, I've worked at it for a while now and can't get it any faster:

SELECT date, count(id) as 'visits' FROM dates 
LEFT OUTER JOIN visits 
ON (dates.date = DATE(visits.start) and account_id = 40 ) 
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 
GROUP BY date ORDER BY date ASC

That query takes about 8 seconds to run. I've added indexes on dates.date, visits.start, visits.account_id and visits.start+visits.account_id and can't get it to run any faster.

Table structure (only showing relevant columns in visit table):

create table visits (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `account_id` int(11) NOT NULL,
    `start` DATETIME NOT NULL,
    `end` DATETIME NULL,
    PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

CREATE TABLE `dates` (
  `date` date NOT NULL,
  PRIMARY KEY (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

dates table contains all days from 2010-1-1 to 2020-1-1 (~3k rows). visits table contains about 400k rows dating from 2010-6-1 to yesterday. I'm using the date table so the join will return 0 visits for days there were no visits.

Results I want for reference:

+------------+--------+
| date       | visits |
+------------+--------+
| 2010-12-13 |    301 |
| 2010-12-14 |    356 |
| 2010-12-15 |    423 |
| 2010-12-16 |    332 |
| 2010-12-17 |    346 |
| 2010-12-18 |    226 |
| 2010-12-19 |    213 |
| 2010-12-20 |    311 |
| 2010-12-21 |    273 |
| 2010-12-22 |    286 |
| 2010-12-23 |    241 |
| 2010-12-24 |    149 |
| 2010-12-25 |    102 |
| 2010-12-26 |    174 |
| 2010-12-27 |    258 |
| 2010-12-28 |    348 |
| 2010-12-29 |    392 |
| 2010-12-30 |    395 |
| 2010-12-31 |    278 |
| 2011-01-01 |    241 |
| 2011-01-02 |    295 |
| 2011-01-03 |    369 |
| 2011-01-04 |    438 |
| 2011-01-05 |    393 |
| 2011-01-06 |    368 |
| 2011-01-07 |    435 |
| 2011-01-08 |    313 |
| 2011-01-09 |    250 |
| 2011-01-10 |    345 |
| 2011-01-11 |    387 |
| 2011-01-12 |      0 |
| 2011-01-13 |      0 |
+------------+--------+

Thanks in advance for any help!

Upvotes: 3

Views: 3076

Answers (3)

jny
jny

Reputation: 8057

How about something like that: outer join on the result of select from eumiro?

SELECT date, v.visits as 'visits' FROM dates 
LEFT OUTER JOIN (SELECT DATE(start) as dt, count(id) as 'visits'
FROM visits 
WHERE account_id = 40
AND date BETWEEN '2010-12-13' AND '2011-01-13' 
GROUP BY DATE(start)
ORDER BY 1)
v
ON (dates.date = v.dt ) 
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 

Edit: edited SQL Edit: another option - inline select, something like that:

SELECT date, (select count(*)  as 'visits' 
FROM  from visits 
where date = DATE(visits.start) and account_id = 40 ) 
) from dates
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 
ORDER BY date ASC

Upvotes: 0

Eric Petroelje
Eric Petroelje

Reputation: 60508

Your problem is here:

ON (dates.date = DATE(visits.start) and account_id = 40 ) 

Because you are using the DATE function on visits.start, MySQL is unable to use an index for the join.

Probably the best solution would be to add a start_date and end_date column to the dates table and index those columns. So for a row with a date of 2011-01-01, the start date would be 2011-01-01 00:00:00 and the end date would be 2011-01-01 23:59:59.

Then you can join directly to the dates table like so:

SELECT date, count(id) as 'visits' FROM dates 
LEFT OUTER JOIN visits 
ON (visits.start BETWEEN dates.start_date AND dates.end_date and account_id = 40 ) 
WHERE date >= '2010-12-13' AND date <= '2011-1-13' 
GROUP BY date ORDER BY date ASC

Another option would be to store the date and time parts separately on the visits table, and join using just the date part.

Upvotes: 4

GolezTrol
GolezTrol

Reputation: 116110

I think it is mainly slow because of the DATE() function. You could add a date column to Visits that stores the whole date and write a trigger to automatically update it when a Visit is inserted or its datetime is updated. That will allow MySQL to make better use of the indexes that are used in the join.

Upvotes: 0

Related Questions