Reputation: 505

MySQL - Get users who have NO orders in current month but don have in previous, in one request

I need to get users count, grouped by user type (A,B,C) and every month (that exist in db) in current year - only for users who don't have paid orders (with total > 0) in every month (every row returned by SQL), but have orders (with total > 0) in any previous months (in any year, not just current). In other words this is inactive users, who placed some paid order before, but don't placed any new orders in current SQL request row month returned.

What I expect to get in results (values are just examples):

label   user_type   data
 Nov    B           2
 Nov    A           1
 Nov    C           3
 Dec    C           1
 .... other months

This means that in December there are 5 users with user type A and 3 users with user type B and 0 users with user type C, who DON'T placed orders in December 2021, but placed orders sometime before December in any year.

Sample DB (two tables - users and orders) with SQL that show number users, by every user type, in every month, who placed orders in this month. Instead of just this simple results, I need to get users counts that DON'T placed orders in this month, but placed paid orders somewhere before.

https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=4c4fadf67bcdc7cc3443f46c387173df

I need SQL that will work with MySQL 5.7

Upvotes: 3

Answers (3)

user1191247

Reputation: 12973

This uses a similar approach to VeteranSlayer but it starts with the cross join between months and users followed by the left join to orders. It also uses ranges for the date comparisons instead of the functions. It may perform really badly but it should give the correct result -

SELECT
    months.month AS `label`,
    u.user_type,
    COUNT(u.userid) AS `data`
FROM (
    SELECT 'Jan' `month`, '2021-01-01' month_start, '2021-01-31' month_end UNION ALL
    SELECT 'Feb', '2021-02-01', '2021-02-28' UNION ALL
    SELECT 'Mar', '2021-03-01', '2021-03-31' UNION ALL
    SELECT 'Apr', '2021-04-01', '2021-04-30' UNION ALL
    SELECT 'May', '2021-05-01', '2021-05-31' UNION ALL
    SELECT 'Jun', '2021-06-01', '2021-06-30' UNION ALL
    SELECT 'Jul', '2021-07-01', '2021-07-31' UNION ALL
    SELECT 'Aug', '2021-08-01', '2021-08-31' UNION ALL
    SELECT 'Sep', '2021-09-01', '2021-09-30' UNION ALL
    SELECT 'Oct', '2021-10-01', '2021-10-31' UNION ALL
    SELECT 'Nov', '2021-11-01', '2021-11-30' UNION ALL
    SELECT 'Dec', '2021-12-01', '2021-12-31'
) months
INNER JOIN users u
LEFT JOIN orders o
    ON o.date BETWEEN months.month_start AND months.month_end
    AND o.user_id = u.userid
WHERE o.user_id IS NULL
AND EXISTS (
    SELECT DISTINCT o1.user_id 
    FROM orders o1 
    WHERE o1.date < months.month_start
    AND  o1.user_id = u.userid
)
GROUP BY months.month, u.user_type
ORDER BY months.month_start ASC, u.user_type ASC;

EDIT

The performance of these queries varies dramatically based on the scale of the dataset, the distribution of data and the indices. I have done some tests with many different index variations and the following test datasets. Note the random data created in the two tables can lead to wildly different performance. The dummy table referenced in the SELECTs of the INSERTs is just a random table with 1M rows.

CREATE TABLE `users` (
    `id` int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
    `user_type` char(1) NOT NULL,
    KEY `IDX_user_type` (`user_type`)
);

INSERT INTO users (user_type)
SELECT 
    CASE (FLOOR(RAND() * 3) + 1) WHEN 1 THEN 'A' WHEN 2 THEN 'B' ELSE 'C' END AS `user_type`
FROM dummy
LIMIT 1000;


CREATE TABLE orders (
    `id` int UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    `user_id` int,
    `date` DATE,
    `total` DECIMAL(6,2),
    KEY `IDX_user_id_date` (`user_id`,`date`)
);

INSERT INTO orders (user_id, date, total)
SELECT
     (FLOOR(RAND() * 1000) + 1) AS `user_id`,
     ('2020-01-01' + INTERVAL FLOOR(RAND() * 685) + 1 DAY) AS `date`,
     ( (FLOOR(RAND() * 10) + 1) * 5) AS `total`
FROM dummy
LIMIT 100000;

The most significant performance difference across the queries came from adding -

KEY `IDX_user_id_date` (`user_id`,`date`)

and adding the user_type index gave a small but consistent improvement -

KEY `IDX_user_type` (`user_type`)

ProGu's query executed consistently with an average time of 1.466 sec. And my query was similarly consistent at 0.922 sec. Your mileage will vary!

I haven't included time's for VeteranSlayer's query as it returned radically different results.

EDIT 2

Repopulated the two tables with 50k users and 1M orders

TRUNCATE TABLE orders;
TRUNCATE TABLE users;

INSERT INTO users (user_type)
SELECT 
    CASE (FLOOR(RAND() * 3) + 1) WHEN 1 THEN 'A' WHEN 2 THEN 'B' ELSE 'C' END AS `user_type`
FROM  (SELECT 1 FROM dummy LIMIT 50000) t;

INSERT INTO orders (user_id, date, total)
SELECT
     (FLOOR(RAND() * 50000) + 1),
     TIMESTAMPADD(SECOND, FLOOR(RAND() * TIMESTAMPDIFF(SECOND, '2016-01-01', '2021-12-13')), '2016-01-01'),
     ((FLOOR(RAND() * 50) + 1) * 5)
FROM (SELECT 1 FROM dummy LIMIT 1000000) t
ORDER BY date;

The resulting distribution of orders, by time and user_id, is quite even which is unlikely to be realistic so this test dataset grossly exacerbates any performance issues, I think.

I was surprised that by using my months table, ProGu's query was significantly faster, dropping from 21.062sec to 9.703sec, and using one less temporary table (two instead of three).

SELECT 
  months.month as label, 
  users.user_type, 
  SUM(
    EXISTS (
     SELECT 1
     FROM orders
     WHERE orders.user_id = users.id
     AND orders.`date` < months.month_start
    ) AND NOT EXISTS (
     SELECT 1
     FROM orders
     WHERE orders.user_id = users.id
     AND orders.`date` BETWEEN months.month_start AND months.month_end
    )
  ) counts
FROM (
    SELECT 'Jan' `month`, '2021-01-01' month_start, '2021-01-31' month_end UNION ALL
    SELECT 'Feb', '2021-02-01', '2021-02-28' UNION ALL
    SELECT 'Mar', '2021-03-01', '2021-03-31' UNION ALL
    SELECT 'Apr', '2021-04-01', '2021-04-30' UNION ALL
    SELECT 'May', '2021-05-01', '2021-05-31' UNION ALL
    SELECT 'Jun', '2021-06-01', '2021-06-30' UNION ALL
    SELECT 'Jul', '2021-07-01', '2021-07-31' UNION ALL
    SELECT 'Aug', '2021-08-01', '2021-08-31' UNION ALL
    SELECT 'Sep', '2021-09-01', '2021-09-30' UNION ALL
    SELECT 'Oct', '2021-10-01', '2021-10-31' UNION ALL
    SELECT 'Nov', '2021-11-01', '2021-11-30' UNION ALL
    SELECT 'Dec', '2021-12-01', '2021-12-31'
) months
CROSS JOIN users
GROUP BY months.month, users.user_type
ORDER BY months.month_start ASC, users.user_type ASC

My query above can be significantly improved by pre grouping the orders data for the current year (your mileage will vary but worth considering) -

SELECT
    months.month AS `label`,
    u.user_type,
    COUNT(u.id) AS `data`
FROM (
    SELECT 'Jan' `month`, '2021-01-01' month_start, '2021-01-31' month_end UNION ALL
    SELECT 'Feb', '2021-02-01', '2021-02-28' UNION ALL
    SELECT 'Mar', '2021-03-01', '2021-03-31' UNION ALL
    SELECT 'Apr', '2021-04-01', '2021-04-30' UNION ALL
    SELECT 'May', '2021-05-01', '2021-05-31' UNION ALL
    SELECT 'Jun', '2021-06-01', '2021-06-30' UNION ALL
    SELECT 'Jul', '2021-07-01', '2021-07-31' UNION ALL
    SELECT 'Aug', '2021-08-01', '2021-08-31' UNION ALL
    SELECT 'Sep', '2021-09-01', '2021-09-30' UNION ALL
    SELECT 'Oct', '2021-10-01', '2021-10-31' UNION ALL
    SELECT 'Nov', '2021-11-01', '2021-11-30' UNION ALL
    SELECT 'Dec', '2021-12-01', '2021-12-31'
) months
INNER JOIN users u
LEFT JOIN (
    SELECT `user_id`, DATE_FORMAT(`date`, '%Y-%m-01') AS `m`
    FROM `orders`
    WHERE `date` >= '2021-01-01'
    GROUP BY `user_id`, `m`
) o
    ON o.m = months.month_start
    AND o.user_id = u.id
WHERE o.user_id IS NULL
AND EXISTS (
    SELECT 1
    FROM orders o1 
    WHERE o1.date < months.month_start
    AND  o1.user_id = u.id
)
GROUP BY months.month, u.user_type
ORDER BY months.month_start ASC, u.user_type ASC

Execution time dropped from 12.422sec to 6.497sec

And the final test I tried was de-normalising by adding first_order_date to the users table -

ALTER TABLE `users` ADD COLUMN `first_order_date` DATE NULL AFTER `user_type`;

UPDATE users u
INNER JOIN (SELECT o.user_id, MIN(date) AS `first_o`, MAX(date) AS `last_o` FROM orders o GROUP BY o.user_id) t ON u.id = t.user_id
SET `u`.`first_order_date` = `t`.`first_o`, `u`.`last_order_date` = `t`.`last_o`;

I then modified my query to use this instead of the EXISTS sub-query -

SELECT
    `months`.`month` AS `label`,
    `u`.`user_type`,
    COUNT(`u`.`id`) AS `data`
FROM (
    SELECT 'Jan' `month`, '2021-01-01' month_start, '2021-01-31' month_end UNION ALL
    SELECT 'Feb', '2021-02-01', '2021-02-28' UNION ALL
    SELECT 'Mar', '2021-03-01', '2021-03-31' UNION ALL
    SELECT 'Apr', '2021-04-01', '2021-04-30' UNION ALL
    SELECT 'May', '2021-05-01', '2021-05-31' UNION ALL
    SELECT 'Jun', '2021-06-01', '2021-06-30' UNION ALL
    SELECT 'Jul', '2021-07-01', '2021-07-31' UNION ALL
    SELECT 'Aug', '2021-08-01', '2021-08-31' UNION ALL
    SELECT 'Sep', '2021-09-01', '2021-09-30' UNION ALL
    SELECT 'Oct', '2021-10-01', '2021-10-31' UNION ALL
    SELECT 'Nov', '2021-11-01', '2021-11-30' UNION ALL
    SELECT 'Dec', '2021-12-01', '2021-12-31'
) `months`
INNER JOIN `users` `u`
LEFT JOIN (
    SELECT `user_id`, DATE_FORMAT(`date`, '%Y-%m-01') AS `m`
    FROM `orders`
    WHERE `date` >= '2021-01-01'
    GROUP BY `user_id`, `m`
) o
    ON `o`.`m` =  `months`.`month_start`
    AND `o`.`user_id` = `u`.`id`
WHERE `o`.`user_id` IS NULL
AND `u`.`first_order_date` < `months`.`month_start`
GROUP BY `months`.`month`, `u`.`user_type`
ORDER BY `months`.`month_start` ASC, `u`.`user_type` ASC;

This returns the same result in 1.447sec. Obviously, de-normalising like this should be avoided but I included it here as it shows the performance benefit for this one scenario.

Upvotes: 0

ProDec

Reputation: 5410

Try this query to generate counts for all months x user type

SELECT 
  DATE_FORMAT(DATE(CONCAT_WS('-', YEAR(CURDATE()), months.mm, '01')), "%b") as label, 
  users.user_type, 
  SUM(
    EXISTS (
     SELECT 1
     FROM orders
     WHERE orders.user_id = users.userid
     AND orders.`date` < DATE(CONCAT_WS('-', YEAR(CURDATE()), months.mm, '01'))
    ) AND NOT EXISTS (
     SELECT 1
     FROM orders
     WHERE orders.user_id = users.userid
     AND orders.`date` BETWEEN DATE(CONCAT_WS('-', YEAR(CURDATE()), months.mm, '01')) AND LAST_DAY(DATE(CONCAT_WS('-', YEAR(CURDATE()), months.mm, '01')))
    )
  ) counts
FROM (
 SELECT '01' mm
 UNION SELECT '02' UNION SELECT '03' UNION SELECT '04' UNION SELECT '05' 
 UNION SELECT '06' UNION SELECT '07' UNION SELECT '08' UNION SELECT '09' 
 UNION SELECT '10' UNION SELECT '11' UNION SELECT '12'
) months
CROSS JOIN users
GROUP BY months.mm, users.user_type

demo

Upvotes: 1

VeteranSlayer

Reputation: 404

Test this query if it fits your needs

SELECT DATE_FORMAT(o.date, "%b") as label,
UPPER(u.user_type) as user_type,
COUNT(distinct o.user_id) as data FROM orders o 
JOIN users u ON o.user_id = u.userid 
WHERE DATE_FORMAT(o.date, "%Y") = "2021"
AND o.user_id NOT IN 
    (SELECT DISTINCT o1.user_id FROM orders o1 WHERE DATE_FORMAT(o1.date, "%b") = DATE_FORMAT(now(), "%b") AND YEAR(o1.date) = YEAR(now()) )
AND o.user_id IN 
    (SELECT DISTINCT o1.user_id FROM orders o1 WHERE (DATE_FORMAT(o1.date, "%c") < DATE_FORMAT(now(), "%c") OR YEAR(o1.date) < YEAR(now())))

GROUP BY DATE_FORMAT(o.date, "%Y %b"), 
u.user_type HAVING SUM(o.total) > 0 ORDER BY o.date ASC

EDIT The query below returns every month of the year

SELECT months.MONTH as label,
ifnull(UPPER(u.user_type), '-') as user_type,
COUNT(distinct o.user_id) as data 
FROM (
                      SELECT 1 AS MONTH
                       UNION SELECT 2 AS MONTH
                       UNION SELECT 3 AS MONTH
                       UNION SELECT 4 AS MONTH
                       UNION SELECT 5 AS MONTH
                       UNION SELECT 6 AS MONTH
                       UNION SELECT 7 AS MONTH
                       UNION SELECT 8 AS MONTH
                       UNION SELECT 9 AS MONTH
                       UNION SELECT 10 AS MONTH
                       UNION SELECT 11 AS MONTH
                       UNION SELECT 12 AS MONTH
            ) as months

LEFT JOIN orders o 
 ON DATE_FORMAT(o.date, "%c") = months.MONTH 
LEFT JOIN users u ON o.user_id = u.userid
WHERE (DATE_FORMAT(o.date, "%Y") = "2021" OR o.date IS NULL)
AND ( 
        (
        NOT EXISTS 
            (SELECT DISTINCT o1.user_id 
                FROM orders o1 
                    WHERE 
                    DATE_FORMAT(o1.date, "%b") = DATE_FORMAT(now(), "%b") 
                    AND YEAR(o1.date) = YEAR(now())
                    AND  o1.user_id = o.user_id
            )
        AND EXISTS
            (SELECT DISTINCT o1.user_id 
                FROM orders o1 
                    WHERE 
                    (DATE_FORMAT(o1.date, "%c") < DATE_FORMAT(now(), "%c") OR YEAR(o1.date) < YEAR(now())) AND  o1.user_id = o.user_id
            )
        ) 
        OR o.user_id IS null OR u.userid IS NULL
    )
GROUP BY months.MONTH, u.user_type ORDER BY months.MONTH ASC

Upvotes: 0

MySQL - Get users who have NO orders in current month but don have in previous, in one request

Answers (3)

Related Questions