Reputation: 35
I would like to find the day difference between the latest and the 2nd latest distinct order_id
for each user.
The intended output would be:
user_id | order_diff
1 | 1
3 | 7
8 | 1
order_diff
represents the difference in days between 2 distinct order_id
. In the event that there are no two distinct order_id
(as in the case for user id 9), the result is not returned.
In this case, the order_diff
for user_id
1
is 1 since the day difference between his 2 distinct order_id
is 1. However, there is no order_diff
for user_id
9 since he has no 2 distinct `order_id'.
This is the dataset:
user_id order_id order_time
1 208965785 2016-12-15 17:14:13
1 201765785 2016-12-14 17:19:05
1 203932785 2016-12-13 20:41:30
1 209612785 2016-12-14 20:14:32
1 208112785 2016-12-14 20:27:08
1 205525785 2016-12-14 17:01:26
1 208812785 2016-12-14 20:18:23
1 206432785 2016-12-11 20:32:20
1 206698785 2016-12-14 10:50:15
2 209524795 2016-11-26 18:06:21
3 206529925 2016-10-01 10:43:57
3 203729925 2016-10-08 10:43:11
4 204876145 2016-09-24 10:23:49
5 203363157 2016-07-13 23:56:43
6 207784875 2017-01-04 12:21:21
7 206437177 2016-06-25 02:40:33
8 202819645 2016-09-09 11:47:27
8 202819645 2016-09-09 11:47:27
8 202819646 2016-09-08 11:47:27
9 205127187 2016-06-05 22:21:18
9 205127187 2016-06-05 22:21:18
11 207874877 2016-06-17 16:49:44
12 204927595 2016-11-28 23:05:40
This is the code that I am currently using:
SELECT e1.user_id,datediff(e1.order_time,e2.time), e1.order_id FROM
sales e1
JOIN
sales e2
ON
e1.user_id=e2.user_id
AND
e1.order_id = (SELECT distinct order_id FROM sales temp1 WHERE temp1.order_id =e1.order_id ORDER BY order_time DESC LIMIT 1)
AND
e2.order_id = (SELECT distinct order_id FROM sales temp2 WHERE temp2.order_id=e2.order_id ORDER BY order_time DESC LIMIT 1 OFFSET 1)
My output does not produce the desired output and it also ignores the cases where order_ids
are the same.
Edit: I would also like the query to be extended to larger datasets where the 2nd most recent order_time
may not be the min(order_time)
Upvotes: 0
Views: 192
Reputation: 60482
Based on your fiddle:
select user_id,
datediff(max(order_time),
( -- Scalar Subquery to get the 2nd largest order_time
select max(order_time)
from orders as o2
where o2.user_id = o.user_id -- same user
and o2.order_time < max(o.order_time) -- but not the max time
)
) as diff
from orders as o
group by user_id
having diff is not null -- if there's no 2nd largest time diff will be NULL
Upvotes: 1
Reputation: 28844
Following would work:
Schema (MySQL v5.7)
CREATE TABLE orders
(`user_id` int, `order_id` int, `order_time` datetime)
;
INSERT INTO orders
(`user_id`, `order_id`, `order_time`)
VALUES
(1,208965785,'2016-12-15 17:14:13'),
(1,201765785,'2016-12-14 17:19:05'),
(1,203932785,'2016-12-13 20:41:30'),
(1,209612785,'2016-12-14 20:14:32'),
(1,208112785,'2016-12-14 20:27:08'),
(1,205525785,'2016-12-14 17:01:26'),
(1,208812785,'2016-12-14 20:18:23'),
(1,206432785,'2016-12-11 20:32:20'),
(1,206698785,'2016-12-14 10:50:15'),
(2,209524795,'2016-11-26 18:06:21'),
(3,206529925,'2016-10-01 10:43:57'),
(3,203729925,'2016-10-08 10:43:11'),
(4,204876145,'2016-09-24 10:23:49'),
(5,203363157,'2016-07-13 23:56:43'),
(6,207784875,'2017-01-04 12:21:21'),
(7,206437177,'2016-06-25 02:40:33'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819645,'2016-09-09 11:47:27'),
(8,202819646,'2016-09-08 11:47:27'),
(9,205127187,'2016-06-05 22:21:18'),
(9,205127187,'2016-06-05 22:21:18'),
(11,207874877,'2016-06-17 16:49:44'),
(12,204927595,'2016-11-28 23:05:40');
Query #1
SELECT dt2.user_id,
MIN(datediff(dt2.latest_order_time,
dt2.second_latest_order_time)) AS order_diff
FROM (
SELECT o.user_id,
o.order_time AS latest_order_time,
(SELECT o2.order_time
FROM orders AS o2
WHERE o2.user_id = o.user_id AND
o2.order_id <> o.order_id
ORDER BY o2.order_time DESC LIMIT 1) AS second_latest_order_time
FROM orders AS o
JOIN (SELECT user_id, MAX(order_time) AS latest_order_time
FROM orders
GROUP BY user_id) AS dt
ON dt.user_id = o.user_id AND
dt.latest_order_time = o.order_time
) AS dt2
WHERE dt2.second_latest_order_time IS NOT NULL
GROUP BY dt2.user_id;
| user_id | order_diff |
| ------- | ---------- |
| 1 | 1 |
| 3 | 7 |
| 8 | 1 |
Details:
order_time
for a user_id
in a sub-select query (Derived Table). We can alias it as latest_order_time
.Join
this result-set to the orders
table. This will help us in considering only the row(s) with maximum value of order_time
for a user_id
.order_time
value for the same user, out of the rest of order_id
value(s). We can alias it as second_latest_order_time
.second_latest_order_time
is null
, and calculate datediff()
for the rest.Group By
is needed, as your data has multiple entries for a Upvotes: 1
Reputation: 5240
Here is the solution:
SELECT user_id,
DATEDIFF(MAX(order_time), MIN(order_time)) as order_diff
FROM orders
GROUP BY user_id
HAVING order_diff > 0;
Here is a link to test it.
Upvotes: 0