Reputation: 65
I have a table that has over 100,000,000 rows and I have a query that looks like this:
SELECT
COUNT(IF(created_at >= '2015-07-01 00:00:00', 1, null)) AS 'monthly',
COUNT(IF(created_at >= '2015-07-26 00:00:00', 1, null)) AS 'weekly',
COUNT(IF(created_at >= '2015-06-30 07:57:56', 1, null)) AS '30day',
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null)) AS 'recent'
FROM
items
WHERE
user_id = 123456;
The table looks like so:
CREATE TABLE `items` (
`user_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`,`created_at`),
KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The explain looks fairly harmless, minus the massive row counts:
1 SIMPLE items ref PRIMARY,user_id user_id 4 const 559864 Using index
I use the query to gather counts for a specific user for 4 segments of time. Is there a smarter/faster way to obtain the same data or is my only option to tally these as new rows are put into this table?
Upvotes: 2
Views: 118
Reputation: 142296
INDEX(user_id, created_at)
-- optimalAND created_at >= '2015-06-30 07:57:56'
-- helps because it cuts down on the number of index entries to touchUNION
does not help since it leads to 4 times as much work.SELECTs
does not help for the same reason.Also
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null))
can be shortened to
SUM(created_at >= '2015-07-29 17:03:44')
(But probably does not speed it up much)
If the data does not change over time, only new rows are added, then Summary tables of past data would lead to a significant speedup, but only if you could avoid things like '07:57:56' for '30day'. (Why have '00:00:00' for only some of them?) Perhaps the speedup would be another factor of 10 on top of the other changes. Want to discuss further?
(I do not see any advantage in using PARTITION
.)
Upvotes: 0
Reputation: 49049
I would add an index on created_at field:
ALTER TABLE items ADD INDEX idx_created_at (created_at)
or (as Thomas suggested) since you are also filtering for user_id a composite index on created_at and user_id:
ALTER TABLE items ADD INDEX idx_user_created_at (user_id, created_at)
and then I would write your query as:
SELECT 'monthly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-01 00:00:00' AND user_id = 123456
UNION ALL
SELECT 'weekly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-26 00:00:00' AND user_id = 123456
UNION ALL
SELECT '30day' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-06-30 07:57:56' AND user_id = 123456
UNION ALL
SELECT 'recent' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-29 17:03:44' AND user_id = 123456
yes, the output is a little different. Or you can use inline queries:
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...
and if you want an average, you could use a subquery:
SELECT
monthly,
weekly,
monthly / total,
weekly / total
FROM (
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...,
(SELECT COUNT(*) FROM items WHERE user_id=...) AS total
) s
Upvotes: 1
Reputation: 7267
If you have an index on created_at, I would also put in the where clause created_at >= '2015-06-30 07:57:56' which is the lowest date possible in your segment.
Also with the same index it might work splitting in 4 queries:
select count(*) AS '30day'
FROM
items
WHERE
user_id = 123456
and created_at >= '2015-06-30 07:57:56'
union ....
And so on
Upvotes: 2