Reputation: 1697
The VISITS table
CREATE TABLE `VISITS` (
`USER_ID` char(255) COLLATE utf8_unicode_ci NOT NULL,
`VISITED_IN` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
KEY `USER_ID` (`USER_ID`,`VISITED_IN`),
CONSTRAINT `VISITS_ibfk_1` FOREIGN KEY (`USER_ID`) REFERENCES `USERS` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The index I have
KEYNAME | TYPE | UNIQUE | PACKED | COLUMN | CARDINALITY | COLLECTION | NULL |
---|---|---|---|---|---|---|---|
USER_ID | BTREE | NO | NO | USER_ID VISITED_IN |
1 2244 |
A A |
NO NO |
What am I trying to do?
There are 325,899 rows inside the VISITS table and all of these are for a single user. (USER_ID = "C9YAoq")
I counted the number of rows that were added in the last 60 minutes
, 24 hours
, 7 days
, 30 days
, 6 months
, and 12 months
in the same query.
Currently, The query takes between 12 to 15 seconds to count the all rows
The query
SELECT
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 60 MINUTE) THEN 1 END) AS LAST_60_MINUTES,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 24 HOUR) THEN 1 END) AS LAST_24_HOURS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN 1 END) AS LAST_7_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN 1 END) AS LAST_30_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 6 MONTH) THEN 1 END) AS LAST_6_MONTHS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 12 MONTH) THEN 1 END) AS LAST_12_MONTHS
FROM VISITS WHERE USER_ID = 'C9YAoq'
How do I handle the rows where the date is more than 12 months?
Simply, I scheduled cron jobs in PHP and that is run every day.
The user is very very rare to reaching to 325,899 visits in 1 year but I'm trying to handle bad scenarios if happened.
How can I make the query faster during counting 325,899 rows?
Also, How can I make the delete faster for expired rows?
Can I improve the index more than that?
Update: According to the comments, I enabled explain and here is the result
ID | SELECT_TYPE | TABLE | PARTITIONS | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | FILTERED | EXTRA |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | VISITS | NULL | REF | USER_ID | USER_ID | 765 | CONST | 1 | 100.00 | USING INDEX |
Upvotes: 0
Views: 96
Reputation: 2135
I can suggest you a this kind of process. May be this is not the answer for you, but using these kind of summery table will gives you better performance.
Note:
Test
mean my DB name. change it accordingly.Note: better to use "COALESCE(fieldname,0)" for DB fields to ignore any issues from null values.
CREATE TABLE userwise_visit_history ( user_id INT (10), last7days INT (11), last24h INT (11), last6month INT (11), last12month INT (11), created_date TIMESTAMP );
/* Everyday at 12 AM run this query */
DELIMITER $$
CREATE PROCEDURE my_proc_visited_history ()
BEGIN
-- TRUNCATE TABLE `Test`.`userwise_visit_history`;
/* truncate table removed with active user modification */
DELETE FROM userwise_visit_history WHERE user_id IN (SELECT DISTINCT user_id FROM visits
WHERE CAST(visited_in AS DATE) BETWEEN DATE_SUB(TIMESTAMP(CURRENT_DATE), INTERVAL 1 DAY) AND
TIMESTAMP(CURRENT_DATE));
INSERT INTO `Test`.`userwise_visit_history` (`user_id`, `last7days`, `last24h`, `last6month`, `last12month`)
SELECT user_id,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN 1 END) AS LAST_7_DAYS,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN 1 END) AS LAST_30_DAYS,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 6 MONTH) THEN 1 END) AS LAST_6_MONTHS,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 12 MONTH) THEN 1 END) AS LAST_12_MONTHS
FROM visits
WHERE visited_in < TIMESTAMP(CURRENT_DATE)
/* this section will help to optimize the function process by using only for the active users */
AND user_id IN (SELECT DISTINCT user_id FROM visits
WHERE CAST(visited_in AS DATE) BETWEEN DATE_SUB(TIMESTAMP(CURRENT_DATE), INTERVAL 1 DAY) AND
TIMESTAMP(CURRENT_DATE))
/*end of active user modification */
GROUP BY user_id ;
END$$
DELIMITER ;
added below part to the previous function to more optimize the solution.
AND user_id IN (SELECT DISTINCT user_id FROM visits WHERE CAST(visited_in AS DATE) BETWEEN DATE_SUB(TIMESTAMP(CURRENT_DATE), INTERVAL 1 DAY) AND TIMESTAMP(CURRENT_DATE))
That query part will identify the previous day active users only. They are the only users who need to update the history table and others historical data is keep as it is.
When ever you need to execute your query for a user, run this function
/* To Get the login data use following function*/
DELIMITER $$ CREATE PROCEDURE my_proc_visited_Data(IN usr_id INT) BEGIN DECLARE i_last60min INT; DECLARE i_last24h INT; DECLARE i_last7days INT; DECLARE i_last30Days INT; DECLARE i_last6month INT; DECLARE i_last12month INT; SELECT `last7days`, `last24h`, `last6month`, `last12month` INTO i_last7days, i_last30Days, i_last6month, i_last12month FROM `Test`.`userwise_visit_history` WHERE user_id = usr_id; SELECT COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 60 MINUTE) THEN 1 END) AS LAST_60_MINUTES, COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 24 HOUR) THEN 1 END) AS LAST_24_HOURS INTO i_last60min, i_last24h FROM visits WHERE user_id = usr_id AND visited_in >= TIMESTAMP(CURRENT_DATE); SELECT i_last60min LAST_60_MINUTES, i_last24h LAST_24_HOURS, i_last7days + i_last24h LAST_7_DAYS, i_last30Days + i_last24h LAST_30_DAYS, i_last6month + i_last24h LAST_6_MONTHS, i_last12month + i_last24h LAST_12_MONTHS; END$$ DELIMITER ;
Run the procedure with the user name
CALL my_proc_visited_Data(1);
Upvotes: 1
Reputation: 4413
First you need to execute an EXPLAIN statement for having execution plan:
EXPLAIN SELECT
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 60 MINUTE) THEN 1 END) AS LAST_60_MINUTES,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 24 HOUR) THEN 1 END) AS LAST_24_HOURS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN 1 END) AS LAST_7_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN 1 END) AS LAST_30_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 6 MONTH) THEN 1 END) AS LAST_6_MONTHS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 12 MONTH) THEN 1 END) AS LAST_12_MONTHS
FROM VISITS WHERE USER_ID = 'C9YAoq'
Once you get the execution plan you will have information where the problem come from.
if execution plan is ok perharps you server have a mismatched configuration. So if you want to investigate this i recommand using MySQL profiler to see where the problem come from.
You can also use another approach for solving your problem.
Write a store procedure that execute one query for each of you metrics, so you will have 6 queries in the store procedure and you can return the 6 metrics you need. As queries are more simple and have execution plan more efficient, i think you will get faster.
Upvotes: 2