How to optimize my query about join 3 tables in same table?

I want to get id customer who buy my product every month from three months ago. Today is 2020-02-15. So i want get customer who buy in November 2019, December 2019, January 2020.

I have only 1 table order (MySQL) like this:

Order Table (Primary Key = ID (Auto Increment) ):

-----------------------------------------------
|      ID      |    id_cust  |    buy_date    |
-----------------------------------------------
|       1      |       10    |   2019-11-01   | 
|       2      |       11    |   2019-11-10   |
|       3      |       10    |   2019-12-11   |
|       4      |       12    |   2019-12-12   |
|       5      |       10    |   2020-01-13   |
|       6      |       11    |   2020-01-14   |
|       7      |       12    |   2020-01-15   |
-----------------------------------------------

Based on what i want, the answer is id_cust 10

I've tried it and get results like this:

SELECT g1.`id_cust`
FROM `orders` g1 
    JOIN `orders` g2
    ON g2.`id_cust`   = g1.`id_cust`
      AND g2.`buy_date` >= STR_TO_DATE(CONCAT('01-', LPAD(MONTH(DATE_SUB(NOW(), INTERVAL 2 MONTH)), 2, '0'), '-', YEAR(DATE_SUB(NOW(), INTERVAL 2 MONTH))), '%d-%m-%Y')
      AND g2.`buy_date` < STR_TO_DATE(CONCAT('01-', LPAD(MONTH(DATE_SUB(NOW(), INTERVAL 1 MONTH)), 2, '0'), '-', YEAR(DATE_SUB(NOW(), INTERVAL 1 MONTH))), '%d-%m-%Y')
    JOIN `orders` g3
    ON g3.`id_cust`   = g1.`id_cust`
      AND g3.`id_cust`   = g2.`id_cust`
      AND g3.`buy_date` >= STR_TO_DATE(CONCAT('01-', LPAD(MONTH(DATE_SUB(NOW(), INTERVAL 1 MONTH)), 2, '0'), '-', YEAR(DATE_SUB(NOW(), INTERVAL 1 MONTH))), '%d-%m-%Y')
      AND g3.`buy_date` < STR_TO_DATE(CONCAT('01-', LPAD(MONTH(NOW()), 2, '0'), '-', YEAR(NOW())), '%d-%m-%Y')
WHERE g1.`buy_date` >= STR_TO_DATE(CONCAT('01-', LPAD(MONTH(DATE_SUB(NOW(), INTERVAL 3 MONTH)), 2, '0'), '-', YEAR(DATE_SUB(NOW(), INTERVAL 3 MONTH))), '%d-%m-%Y')
AND g1.`buy_date` < STR_TO_DATE(CONCAT('01-', LPAD(MONTH(DATE_SUB(NOW(), INTERVAL 2 MONTH)), 2, '0'), '-', YEAR(DATE_SUB(NOW(), INTERVAL 2 MONTH))), '%d-%m-%Y')
GROUP BY g1.`id_cust`

Please help me to simplify my syntax because this is very slow when it is run on a lot of data or if I am wrong in my syntax, please correct my syntax.

Upvotes: 0

Views: 88

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

How about this?

select c.id_cust
from (select o.id_cust, year(buy_date) as yyyy, month(buy_date) as mm,
             row_number() over (partition by o.id_cust) as month_counter
      from orders o
      where buy_date >= date_format(current_date - interval 3 month, '%Y-%m-%d') and
            buy_date < date_format(current_date, '%Y-%m-%d')
      group by id_cust, yyyy, mm
     ) c
where month_counter = 3;

What this does is filter down to only the three months you care about. Then it aggregates by the year and month and returns only the third row.

Actually, this is more easily expressed as:

select o.id_cust
from orders o
where buy_date >= date_format(current_date - interval 3 month, '%Y-%m-%d') and
      buy_date < date_format(current_date, '%Y-%m-%d')
group by o.id_cust
having count(distinct year(buy_date), month(buy_date)) = 3;

Upvotes: 1

Paul Spiegel
Paul Spiegel

Reputation: 31802

I would use Gordon's second query. But if your code works (as an exercise) you might optimize the execution time by creating indices on (buy_date, id_cust) and on (id_cust, buy_date). The first one for the WHERE clause and the second for the ON clauses.

With this schema

CREATE TABLE orders (
  `ID` INTEGER primary key,
  `id_cust` INTEGER,
  `buy_date` VARCHAR(10)
);

The EXPLAIN result of your query is

| id  | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra                                              |
| --- | ----------- | ----- | ---------- | ---- | ------------- | --- | ------- | --- | ---- | -------- | -------------------------------------------------- |
| 1   | SIMPLE      | g1    |            | ALL  |               |     |         |     | 7    | 14.29    | Using where; Using temporary; Using filesort       |
| 1   | SIMPLE      | g2    |            | ALL  |               |     |         |     | 7    | 14.29    | Using where; Using join buffer (Block Nested Loop) |
| 1   | SIMPLE      | g3    |            | ALL  |               |     |         |     | 7    | 14.29    | Using where; Using join buffer (Block Nested Loop) |

No key is used and "Block Nested Loop" sounds very bad.

After adding the indices

ALTER TABLE orders ADD INDEX (buy_date, id_cust);
ALTER TABLE orders ADD INDEX (id_cust, buy_date);
| id  | select_type | table | partitions | type  | possible_keys    | key     | key_len | ref             | rows | filtered | Extra                    |
| --- | ----------- | ----- | ---------- | ----- | ---------------- | ------- | ------- | --------------- | ---- | -------- | ------------------------ |
| 1   | SIMPLE      | g1    |            | index | buy_date,id_cust | id_cust | 48      |                 | 7    | 14.29    | Using where; Using index |
| 1   | SIMPLE      | g2    |            | ref   | buy_date,id_cust | id_cust | 5       | test.g1.id_cust | 2    | 14.29    | Using where; Using index |
| 1   | SIMPLE      | g3    |            | ref   | buy_date,id_cust | id_cust | 5       | test.g1.id_cust | 2    | 14.29    | Using where; Using index |

db-fiddle

It looks much better now, though it desides not to use my first index (probably due to GROUP BY).

Then I would simplify the query to:

SELECT DISTINCT g1.id_cust
FROM orders g1 
    JOIN orders g2 ON g2.id_cust = g1.id_cust
    JOIN orders g3 ON g3.id_cust = g1.id_cust
    -- AND g3.id_cust  = g2.id_cust -- redundant condition
WHERE g1.buy_date >= DATE_FORMAT(NOW() - INTERVAL 3 MONTH, '%Y-%m-01')
  AND g1.buy_date <  DATE_FORMAT(NOW() - INTERVAL 2 MONTH, '%Y-%m-01')
  AND g2.buy_date >= DATE_FORMAT(NOW() - INTERVAL 2 MONTH, '%Y-%m-01')
  AND g2.buy_date <  DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-01')
  AND g3.buy_date >= DATE_FORMAT(NOW() - INTERVAL 1 MONTH, '%Y-%m-01')
  AND g3.buy_date <  DATE_FORMAT(NOW() - INTERVAL 0 MONTH, '%Y-%m-01')
-- GROUP BY g1.id_cust -- You can use DISTINCT instead

db-fiddle

Upvotes: 0

Related Questions