Taha Sami
Taha Sami

Reputation: 1697

How can I make the query faster during counting 325,899 rows?

The VISITS table

CREATE TABLE `VISITS` (
 `USER_ID` char(255) COLLATE utf8_unicode_ci NOT NULL,
 `VISITED_IN` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
 KEY `USER_ID` (`USER_ID`,`VISITED_IN`),
 CONSTRAINT `VISITS_ibfk_1` FOREIGN KEY (`USER_ID`) REFERENCES `USERS` (`ID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

The index I have

KEYNAME TYPE UNIQUE PACKED COLUMN CARDINALITY COLLECTION NULL
USER_ID BTREE NO NO USER_ID
VISITED_IN
1
2244
A
A
NO
NO

What am I trying to do?

There are 325,899 rows inside the VISITS table and all of these are for a single user. (USER_ID = "C9YAoq")

I counted the number of rows that were added in the last 60 minutes, 24 hours, 7 days, 30 days, 6 months, and 12 months in the same query.

Currently, The query takes between 12 to 15 seconds to count the all rows

The query

SELECT 
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 60 MINUTE) THEN 1 END) AS LAST_60_MINUTES,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 24 HOUR) THEN 1 END) AS LAST_24_HOURS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN 1 END) AS LAST_7_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN 1 END) AS LAST_30_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 6 MONTH) THEN 1 END) AS LAST_6_MONTHS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 12 MONTH) THEN 1 END) AS LAST_12_MONTHS
FROM VISITS WHERE USER_ID = 'C9YAoq'

How do I handle the rows where the date is more than 12 months?

Simply, I scheduled cron jobs in PHP and that is run every day.

The user is very very rare to reaching to 325,899 visits in 1 year but I'm trying to handle bad scenarios if happened.

How can I make the query faster during counting 325,899 rows?

Also, How can I make the delete faster for expired rows?

Can I improve the index more than that?


Update: According to the comments, I enabled explain and here is the result

ID SELECT_TYPE TABLE PARTITIONS TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 SIMPLE VISITS NULL REF USER_ID USER_ID 765 CONST 1 100.00 USING INDEX

Upvotes: 0

Views: 96

Answers (2)

isatsara
isatsara

Reputation: 2135

I can suggest you a this kind of process. May be this is not the answer for you, but using these kind of summery table will gives you better performance.

Note: Test mean my DB name. change it accordingly.

Note: better to use "COALESCE(fieldname,0)" for DB fields to ignore any issues from null values.

  1. Create a table like follows
CREATE TABLE userwise_visit_history (
  user_id INT (10),
  last7days INT (11),
  last24h INT (11),
  last6month INT (11),
  last12month INT (11),
  created_date TIMESTAMP 
);
  1. Daily run this job through a scheduler at 12.00 AM

/* Everyday at 12 AM run this query */

DELIMITER $$

CREATE PROCEDURE my_proc_visited_history ()
BEGIN

-- TRUNCATE TABLE `Test`.`userwise_visit_history`;
/* truncate table removed with active user modification */
DELETE FROM userwise_visit_history WHERE user_id IN  (SELECT DISTINCT user_id FROM visits 
WHERE CAST(visited_in AS DATE) BETWEEN DATE_SUB(TIMESTAMP(CURRENT_DATE), INTERVAL 1 DAY) AND 
TIMESTAMP(CURRENT_DATE));

INSERT INTO `Test`.`userwise_visit_history` (`user_id`, `last7days`, `last24h`, `last6month`, `last12month`)
SELECT user_id,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN 1 END) AS LAST_7_DAYS,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN 1 END) AS LAST_30_DAYS,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 6 MONTH) THEN 1 END) AS LAST_6_MONTHS,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 12 MONTH) THEN 1 END) AS LAST_12_MONTHS
FROM visits 
WHERE visited_in < TIMESTAMP(CURRENT_DATE)
/* this section will help to optimize the function process by using only for the active users */
AND user_id IN (SELECT DISTINCT user_id FROM visits 
WHERE CAST(visited_in AS DATE) BETWEEN DATE_SUB(TIMESTAMP(CURRENT_DATE), INTERVAL 1 DAY) AND 
TIMESTAMP(CURRENT_DATE))
/*end of active user modification */
GROUP BY user_id ;

 
END$$


DELIMITER ;

added below part to the previous function to more optimize the solution.

AND user_id IN (SELECT DISTINCT user_id FROM visits WHERE CAST(visited_in AS DATE) BETWEEN DATE_SUB(TIMESTAMP(CURRENT_DATE), INTERVAL 1 DAY) AND TIMESTAMP(CURRENT_DATE))

That query part will identify the previous day active users only. They are the only users who need to update the history table and others historical data is keep as it is.

  1. When ever you need to execute your query for a user, run this function

    /* To Get the login data use following function*/

DELIMITER $$

CREATE PROCEDURE my_proc_visited_Data(IN usr_id INT)
BEGIN

DECLARE i_last60min INT;
DECLARE i_last24h INT;
DECLARE i_last7days INT;
DECLARE i_last30Days INT;
DECLARE i_last6month INT;
DECLARE i_last12month INT;

SELECT    `last7days`, `last24h`, `last6month`, `last12month` INTO i_last7days, i_last30Days, i_last6month, i_last12month
  FROM 
  `Test`.`userwise_visit_history` 
  WHERE user_id = usr_id;
  
SELECT
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 60 MINUTE) THEN 1 END) AS LAST_60_MINUTES,
COUNT(CASE WHEN visited_in >= DATE_SUB(NOW(), INTERVAL 24 HOUR) THEN 1 END) AS LAST_24_HOURS
INTO i_last60min, i_last24h
FROM visits WHERE user_id = usr_id
AND visited_in >= TIMESTAMP(CURRENT_DATE);

SELECT i_last60min LAST_60_MINUTES, i_last24h LAST_24_HOURS, i_last7days + i_last24h LAST_7_DAYS, 
i_last30Days + i_last24h LAST_30_DAYS,  i_last6month + i_last24h LAST_6_MONTHS, 
i_last12month + i_last24h LAST_12_MONTHS;

 
END$$

DELIMITER ;
  1. Run the procedure with the user name

    CALL my_proc_visited_Data(1);

enter image description here

Upvotes: 1

Alaindeseine
Alaindeseine

Reputation: 4413

First you need to execute an EXPLAIN statement for having execution plan:

EXPLAIN SELECT 
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 60 MINUTE) THEN 1 END) AS LAST_60_MINUTES,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 24 HOUR) THEN 1 END) AS LAST_24_HOURS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 7 DAY) THEN 1 END) AS LAST_7_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 30 DAY) THEN 1 END) AS LAST_30_DAYS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 6 MONTH) THEN 1 END) AS LAST_6_MONTHS,
COUNT(CASE WHEN VISITED_IN >= DATE_SUB(NOW(), INTERVAL 12 MONTH) THEN 1 END) AS LAST_12_MONTHS
FROM VISITS WHERE USER_ID = 'C9YAoq'

Once you get the execution plan you will have information where the problem come from.

if execution plan is ok perharps you server have a mismatched configuration. So if you want to investigate this i recommand using MySQL profiler to see where the problem come from.

You can also use another approach for solving your problem.

Write a store procedure that execute one query for each of you metrics, so you will have 6 queries in the store procedure and you can return the 6 metrics you need. As queries are more simple and have execution plan more efficient, i think you will get faster.

Upvotes: 2

Related Questions